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Preface 


A person using statistical data finds himself beset by many 
uncertainties. Usually, in dealing with groups which have not been 
studied, he wishes to use conclusions based on the study of one 
group. How reliable are such conclusions? On what basis shall 
he choose among the various available measures of average, spread, 
and relationship? How large a sample shall he take and what 
design shall he follow.in selecting his cases? 

' Answers to such questions are considered in this book from the 
unified point of view of the relation of sample to population. From 
this point of view, the statistical problems turn out to be matters 
of testing hypotheses, estimation, and experimental design. Per- 
sons working with statistical data often think that their interest 
is only in an analysis of the data at hand. In most cases, however, 
further reflection shows that the interest in the data at hand is due 
to its bearing on situations or groups other than those actually 
observed. = 

Because of this point of view, the concept. of inference is in- 
troduced-in the very first chapter by use of simple examples. By 
wholly intuitive reasoning the reader is led to make his own in- 
ferences without any complex mathematics, or even tables. The 
' next two chapters deal with problems from a population of only 
two classes. In this simple context are developed most of the con- 
cepts of statistical inference which are used throughout the book. 

The intuitive approach is used to introduce all concepts. Many 
simple numerical examples and graphic devices help to maintain 
the clarity of exposition. Per cents and proportions are considered 
first because they are the simplest summary measures with which 
one deals. Means, standard deviations, and a variety of measures 
of simple and multiple relationship are introduced later, together 
with their sampling theory. 

Diverse experimental designs are considered. The simplest is 
the: comparison of two proportions and of two means. Моге com- 
plex designs are described under the headings of analysis of variance 
and analysis of covariance. The use of the power function in experi- 
mental design, still a novelty in textbook literature, is described. 

Elementary computations are introduced as needed in the text, 
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for example, reading the normal curve in Chapter 2, computation 
of mean and standard deviation of a sample in Chapter 7, computa- 
tion of regression and correlation in Chapter 10. Consequently, no 
previous course is needed by the person who has good facility in 
quantitative thinking and who handles symbolism with ease. How- 
ever, the person who works with difficulty in such materials will 
profit by having had a slow-paced introductory course in which he 
works with descriptive statistics and masters computational skills 
(and students have remarkable ability to appraise their own com- 
petences and needs in these matters.) 

The book does not presuppose any college training in mathe- 
matics and does not present mathematical derivations of formulas. 
At first glance it may look more mathematical than most texts 
intended for the non-mathematician because of its use of probability 
symbolism, multiple subscripts, and the expressed limits of summa- 
tion. However, the reader is given careful explanation and practice 
exercises related to these usages. We are convinced that the effort 
of learning these symbols is more than repaid by increased clarity of 
understanding. While the text is planned for the non-mathematical 
reader, we believe that a mathematically competent student who 
intends to study the mathematical theory of statistics will profit 
by such an introductory course which consistently attempts to de- 
velop concepts in relation to real problems. " 

The diversity of techniques used by the research workers in 
any field is today so great that very careful selection must be made 
in order to cover the most essential in a two-semester course. 
However many students are able to take only a one-semester course 
and subsequently find themselves in need of methods other than 
those studied, for which they must usually consult sources using 
unfamiliar style and symbolism. If the first 9 chapters of this 
book are covered in a one-semester course, the student will have 
acquired understanding of the basic concepts and will be in posses- 
Sion of a text in which he can readily locate other techniques as he 
finds need for them. If more class time is available, the book pro- 
vides material appropriate for a two-semester course. 

The first 10 chapters are so tightly organized that little change 
in the order of presentation is advisable. Any of the later chapters 
can’ be omitted, or the order shifted, without great disadvantage, 
If a teacher wishes to include correlation (Chapter 10) in a one- 
semester course, the following sections might be omitted without 
affecting the clarity of the remaining material: 
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Two by Two Table in Very Small Samples pages 103-106 
"Test of Normality pages 119-123 
Design of Samples in Surveys pages 171-177 
Bartlett Test for Homogeneity of Variance pages 192-195 
Comparison of Means of Several Measures 

of the Same Individual pages 216-229 


Many of the exercises are developmental in nature and should 
not be omitted. Answers are provided in order that students may 
appraise their own progress. Additional exercises and review 
questions will be provided by the publishers nt nominal cost. 

In a course in statistical inference for non-mathematical students 
a crucial problem is the treatment of probability. One familiar 
approach is to give careful instruction in the routine of reading 
tables without attempting to develop any real understanding of 
the meaning of the values obtained from such reading. Another is 
to give considerable instruction in eombinatory probabilities and 
the binomial distribution for its own sake using problems of the 
type commonly found in college algebra texts, without making clear 
the relevance of such work to concrete problems. The first approach 
admittedly leaves the student rigid, unable to make even slight 
adaptations in the methods he has been taught, because he does not 
really understand them. The second often frightens the non- ` 
mathématician out of the course. We have tried to steer clear of 
both pitfalls by an intuitive development of probability concepts 
as an aid to the solution of statistical problems. 

A decimal subscript attached to a statistic always relates to the 
percentile rank of that statistic, not its significance level. In this 
departure from tradition we have followed the example of Dixon 
and Massey. Experience with the new practice in six classes con- 
vinces us that it obviates the confusion produced under the usual 
practice by which ¢ and z are given subscripts based on a two-sided 
region and x? and F subscripts based on a one-sided region. 

Data for illustrative purposes have been taken from the pub- 
lished work of many persons and to each of these, named at the 
places where their data are introduced, we express our thanks. 

A modern text presents statistical theory and methods origi- 
nated by a large number of persons and discussed by various succes- 
sive writers each of whom adds some new facet to the development, 
so that a proper acknowledgment of the sources of the authors? 
ideas would require an extensive historical analysis not appropriate 
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for a text of this type. An unfortunate concomitant of this situa- 
tion is the tendency of some readers to ascribe credit to a textbook 
writer for everything he presents. In this as in other texts, only a 
few formulas were originated by the authors and these few are not 
identified. References placed at the ends of chapters are selected 
not so much to acknowledge original sources as to acquaint the 
reader with additional material he is likely to find helpful. The 
authors gladly recognize their indebtedness to all the great writers 
and teachers of statistical theory, from whose work they have 
profited beyond all possibility of acknowledgment. 

Certain tables and charts have by permission been reproduced in 
part or in entirety from the publications of Z. W. Birnbaum, C. J. 
Clopper, H. A. David, F. N. David, R. A. Fisher, J. C. Flanagan, 
T. L. Kelley, F. J. Massey, M. Merrington, F. Mosteller, E. 8. 
Pearson, ЇЧ. Smirnov, С. W. Snedecor, С. М. Thompson, and 
F. Wilcoxon. The usefulness of any modern text depends in large 
measure upon the generosity of the authors and publishers of such 
tables and charts. Still other tables such as those of square roots, 
logarithms, and the normal curve have become almost common 
property and specific acknowledgment of the computers is well nigh 
impossible. 

For assistance in checking computations and in making the 
answer key we are glad to acknowledge the help of Ying Chan, 
Philip Jackson, and Herman Ravitch. For expert typing of difficult 
material in three preliminary mimeographed editions and preparing 
the final copy for the printer, we owe much to Gertrude Ramsdell. 
The criticisms and suggestions of the students who struggled through - 
three mimeographed versions are responsible for some of the helpful 
features of the text, but these students are too numerous to name 
individually. We are particularly indebted to Lincoln Moses, who 
wrote the final chapter, and who, as the result of his experience in 
using the third mimeographed version in one of his classes, pointed 
out several places which needed clarification. 


New York City Н. М. W. 


Albany, N.Y. J. L. 
May 15, 1953 @ 
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1 Inferences Based on 


Simple Experiments 


It is the purpose of this: chapter to introduce the reader to 
some of the simpler aspects of statistical inference in an intuitive 
way. Before describing the basis for and the logic of such in- 
ference some remarks on the nature of statistical inquiry may 
be appropriate. 

The Nature of Statistical Inquiries. Problems calling for 
statistical treatment always involve empirical, or observed, evi- 
dence but not all problems involving empirical evidence are 
statistical. Factual information about a single individual is not 
statistical information. When was Shakespeare born? Is John 
taller than Sam? What does this load of coal weigh? Did the 
prisoner go to his home before or after he had been with the 
murdered man? These are factual but not statistical questions. 

A statistical question always relates to a group of individuals 
rather.than to a single individual and asks what is true of the group. 

Statistical inquiries are of two types. One type of inquiry 
calls only for a description of the group of individuals actually 
observed. Summary measures, or statistics, such as per cents, 
averages or measures of variability are computed from the ob- 
servations made on members of the group. The statistics are 
then used for description of this particular group, but they are 
not used as foundation for a general theory applicable to similar 
individuals that have not been examined. 

A second type of statistical inquiry is more characteristic of 
scientific investigation. It involves a search for principles which 
have some degree of generality. If an investigation deals with 
matters of any general interest, the findings are usually applied 
to a much larger domain than the cases actually observed. We 
may call that larger domain the universe or the population and 
may call the group of cases observed the sample. Now the group 
characters of the sample, that is the statistics obtained from the 
sample, are to be used to obtain information concerning the un- 
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known group characters of the population. Such generalization 
from sample to universe is statistical inference, as contrasted with . 
descriptive statistics which apply only to the group from which 
they are derived. 

General conclusions inferred from limited sets of observations 
are necessarily uncertain. When we reach a conclusion by in- 
ference from sample data we do so at the risk of being in error. 
This risk can be expressed as a probability and can be given a 
numerical value. It is the purpose of this book to describe methods 
which lead to valid inferences and to calculate the risk of error 
involved in those inferences. This book is concerned with prin- 
ciples and methods used in statistical inquiries from which general, 
principles can be derived. 

Experiments in Judging Intelligence from Photographs. A 
good many people are convinced that they are able to judge the 
intelligence of a person by simply looking at his photograph. 
This ability seems to be of a kind which lends itself to study by 
experimental methods. 

A simple experiment might be to present to the subject two 
photographs of persons whose intelligence has been well deter- 
mined. In order to reduce the effect of extraneous cues, these 
should be photographs of two persons of same age and sex, simi- 
larly dressed and similarly posed. The subject is then asked to 
arrange the photographs in order of intelligence. We may ask 
how conclusive the outcome of this experiment will be. Clearly 
there are only two possible outcomes. Either he arranges them 
correctly, or he arranges them incorrectly. Suppose he arranges 
them incorrectly. Then surely he has failed to demonstrate 
ability to judge intelligence from photographs. Suppose, how- 
ever, he arranges the photographs correctly. Are we then to 
accept the conclusion that he can judge intelligence correctly from 
photographs? . 

A difficulty in accepting this conclusion is the important part 
which chance could play. If the subject did not look at the photo- 
graphs at all but only at their backs, or if the photographs were 
marked A and B and he chose one letter as indicating the more 
intelligent person without looking at the photographs at all, we 
might say that he had made a chance arrangement. Under these 
circumstances the two possible outcomes (both photographs right 
or both wrong) are equally likely. The probability of each result 
by chance is'therefore said to be 4. Since the probability of reach- 
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ing a correct result.by chance is so great, when a correct result 
has been reached one can have little confidence that it was due 
to some cause other than chance. 

The experiment just described is evidently not sufficiently ex- 
tensive to reveal satisfactory evidence of ability even when it 
exists. The probability of obtaining a correct result by chance 
alone can be reduced by using more photographs. If three photo- 
graphs, A, B, and C, are used there are six possible orders in which 
they can be arranged, namely, ABC, ACB, BAC, BCA, CAB and 
CBA. Only one of these orders is correct. Let us assume that 
order to be ABC. Under the hypothesis that chance alone oper- 
ates in the arrangement, each order has the same probability 
and that probability is 4 = .167. Consequently, if the person 
whose ability is being questioned arranges these photographs cor- 
rectly, we would have some confidence that the outcome is not due 
to chance. However, the probability of arriving at a true answer 
on a chance basis is still greater than that which statisticians 
accept as desirable. 

Before proceeding with a discussion of a more extensive ex- 
periment it is worth while examining the six arrangements a little 
further. Here a correct placement will be indicated by a capital 
letter, an incorrect placement by a lower case letter. 


One arrangement is entirely correct: ABC 

Three arrangements have one correct placement: Acb, 
baC, and cBa. 

Two arrangements have no correct placement: cab and bca. 

Therefore the probability of three correct placements by 
chance is 4; of two correct is 0; of one correct is 8; of 
none correct is 2. 


The elementary outcomes have now been grouped in accord- 
ance with the number of correct placements, and to each number 
of correct placements has been attributed a probability. The 
placements and their probability may be exhibited as follows: 


Number of correct 


о А Probability 
3 4 = 107 
2 0 
1 2 = .500 
0 2 = 333 


Toran 1.000 
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This is an example of a probability distribution of which more 
will be said in the next chapter. 

Consider now an experiment with four photographs. There 
are 24 possible arrangements of which only one is completely 
accurate. Under the hypothesis that chance only operates, the 
probability of a completely correct arrangement is зт or .042. 
Consequently one would be inclined to give considerable credence 
to the idea that a subject who can arrange four photographs in 
the same order of intelligence as that of the persons photographed 
does have ability to judge intelligence from photographs. | 

In Table 1.1 are shown the 24 possible arrangements of four 
pictures grouped in accordance with the number of photographs 
in correct position. Capital letters show photographs in correct 
position, small letters shows photographs in incorrect position. 
The probability which corresponds to each group is shown in the 
column at the right. 


TABLE 1.1 Classification of all Possible Arrangements of Four Photographs, 


А,В Сапар 
Number i Number of 
correctly Рош arrange- Probability 
placed ی‎ SEC ments 
4 ABCD 1 „= .042 
3 0 0 
2 ABde аВСа Афр 6 = .250 
AdCb сВар  baCD 
1 Acdb bdCa cBda ср 8 = .333 
Adbe daCb dBac Ысар 
0 bade cadb dabe 9 de = 375 
bdac cdab аса 
beda cdba афа ia 
Toran 24 1.000 


With five photographs the number of possible arrangements 
would. be 120, only one of whieh is entirely correct. Under the 
hypothesis of pure chance all these 120 arrangements are equally 
likely dnd so the probability of achieving a correct arrangement 
by chance is 135 = .008. 

We might go on considering the use of larger and larger num- 
bers of photographs. However, one purpose of this chapter is to 
present simple situations in which the reader can without too 
much difficulty enumerate the possible outcomes of an experi- 
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ment. As the number of photographs increases such enumeration 
becomes exceedingly tedious. For 6 there are 720 possible ar- 
rangements. Therefore we shall carry this particular type of 
experiment no further. 

Statistical Hypotheses and Their Verification. The proce- 
dures and modes of reasoning just described will be used so widely 
in the book that it is desirable to formulate them in general terms. 

For convenience the general formulation of the procedure and 
its application to the preceding problem will be displayed in 
parallel columns as a series of steps. 


General formulation 
1. An empirical population is de- 
fined 


2. A hypothesis about the popula- 
tion is formulated. 


3. The consequences of the hypoth- 
esis are formulated. For this 
purpose a character of the sample 
called a statistic is formulated and 
the probability distribution of 
the statistic over all possible 
samples is determined. This dis- 
tribution is called a sampling 
distribution. 

4, A random sample of elements 
from the population is drawn and 
observations are made on those 
elements. 


5. The value of the statistic is ob- 
observed in the sample. 


Application to previous experiment 

1. A population of photographs of 
persons with varying intelligence 
is defined. 

. The hypothesis is formulated 
that any effort by the experi- 
menter to arrange photographs 
in relation to intelligence will 
provide a result which is no 
better than a chance arrange- 
ment. 

. In view of the hypothesis the 
probability distribution in Table 
l.l is obtained. This is the 
distribution of the statistic, 
“number of photographs in cor- 
rect position." 


ho 


e 


4, A random sample of four photo- 
graphs is drawn, and the pho- 
tographs are arranged by the 
experimenter in what appears to 
him to be the correct order of 
intelligence of the persons photo- 
graphed. 

The number of objects in correct 
position is counted. This num- 
ber is the observed value of the 
statistic. 


ad 
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6. The hypothesis is tested by com- 
paring the value of the statistic 
in the observed sample with the 
probability distribution for all 
possible samples. If the observed 
statistic has little likelihood of 
occurring the hypothesis is re- 


6. If the experiment results in all 
four photographs correctly ar- 
ranged, then Table 1.1 shows 
that such a result has probability 
of only .008 under the hypothesis. 
Since this usually is considered a 
low probability the hypothesis 


jected. Otherwise the hypothesis is rejected. 


is accepted. 


Now let us consider a different design which might have been 
employed. Suppose pairs of photographs are obtained such that 
each pair shows two persons of the same sex and same age, sim- 
ilarly dressed and similarly posed, but one of the two persons has 
an intelligence quotient under 85 and the other an intelligence 
quotient over 120. Some pairs are male, some female; some are 
children, some adolescents, some adults. Suppose that many 
such pairs of photographs are available, and from this supply an 
experimenter selects five pairs by some random process, and 
fastens each pair on a large cardboard. To decide which picture 
shall be placed at the right, he tosses a coin. The test is then 
administered by а different person who does not know which pic- 
ture is which, lest unconsciously he might give the subject a cue. 
Under each picture is an identifying number. Looking at each 
pair of pictures in turn, the subject tries to decide which picture 
belongs to a person with high intelligence and he records the 
number of that picture. For each pair on which he makes a cor- 
rect judgment his paper is scored +, for each pair on which he is 
in error it is scored —. 

The number of possible records is 25 = 32. These are listed 
in Table 1.2. If each of the 32 records is equally probable, each 
has probability 35 = .032. Thus the probability of a perfectly 
correct result by chance is .03. Suppose the .05 level of proba- 
bility has been agreed upon before the experiment was under- 
taken. A record of 5 correct choices has probability less than 
.05 and therefore.gives sufficient evidence for rejecting the hy- 
pothesis that the choices were made by chance. A record of fewer 
than five correct choices is insufficient to reject the chance hy- 
pothesis. 
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TABLE 1.2 Probability Distribution of the Number Right out of Five Choices 
when Each Choice Is as Likely to be Right as to be Wrong 
(+ indicates a right choice, — a wrong one) 
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2 Probability Distributions 


A more formal treatment of probability and its relation _ 
to statistical inference will be presented in this chapter. 

Observations. In any statistical study it is essential to ob- 
serve some character of the objects under consideration and to 
make a record of that observation. Suppose, for example, that 
the character is age and that the objects are persons. The record 
of ‘‘observation” may take the form of a numerical measure such 
as “age in years at last birthday”; it may be merely the assigning _ 
of an individual to the appropriate one of a series of age categories; 
or it may be merely the record + or — in answer to.such a ques- 
tion as “Is he of voting age?" The individuals in a group may 
be ranked in order of age and the numbers 1, 2, 3 . . . N assigned 
to them, and these ranks will also be called observations. Thus 
the term observation is a quite general term which includes such 
more specific terms as measure, score, category, rank, presence or 
absence of a trait. An observation is a record of information 
about an individual regardless of the form of that record. Some- 
times for the sake of brevity an individual observation may be 
called “an individual.” 

Variables. When an observation is made on a characteristic 
of an individual, it is expected that one of several possible values 
of the characteristic will be observed. The values may be cate- 
gorical, as when observations are made on sex; or may be scaled, 
as observations on age; or orderéd, as A is greater than B and B. 
is greatér than C. A characteristic which may take several values. 
is called a variable. 

Two kinds of variables are considered in statistics, discrete 
and continuous. If a variable can take only a finite set of values 
it is called discrete. Variables like sex, marital status, or number 
of children in a family are discrete variables. i 

Variables like height and weight are considered continuous be- 
cause they may take any value in a continuous interval'of measure- 
ment. Because of the limitations of measuring instruments, the 
actual set’ of possible measures is finite, even in measurements 
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on a continuous variable. Thus, in measurements on intelligence 
only a finite number of test scores can materialize. Ni evertheless, 
it is convenient to deal with such variables as measures of a con- 
tinuous underlying trait. 

Sometimes it is convenient to deal with a continuous variable 
as if it were discrete. Thus one may divide ages of persons into 
groups, as, for example, ages below 40, ages 40 to 59, and ages 
60 or above. 

The first four chapters of this book deal mainly with discrete 
variables. Beginning with Chapter 5 considerable attention will 
be given to continuous variables. 

Populations Defined on a Discrete Variable. The set of ob- 
servations made in a statistical investigation may be considered as 
2 sample from a larger, perhaps infinitely large, group of elements 
commonly called the population or the universe, and sometimes 
called the supply. When observations are made on a discrete 
variable, the values of the variable may be thought of as classes 
into which the totality of elements is distributed. From this point 
of view of the population an observation is merely a drawing of 
an element from one of the classes which make up the population. 

The population is characterized by these classes and by the 
proportion of all elements in each class. We shall use the symbols 
С; for the ith class, and Р; for the proportion of elements of the 
population contained in this class. 

There are two kinds of populations. In one kind the elements 
making up the population actually exist. Such a population is 
encountered in making a survey of a community. If the variable 
studied is, say, family віле, ıa single family is an element of the 
population, and all these elements may be considered as: dis- 
tributed in classes according to their numerical value. If the 
community is large the study of family size may be conducted 
by use of a sample. 

Another kind of population is usually involved in experiments. 
Here the totality of elements is only conceptual and not existent. 
For such a population it is preferable to think of Р; as the chance, 
or probability, that an observation is in class C;. Thus, in throw- 
ing a coin the population consists of the readings on all possible 
throws; but since “all possible throws" can never be made, the 
totality of all such readings is only conceptual. Suppose that 
the class €; is head and the class C; is tail. If the coin is unbiased 
and the throw is fair, then Р, = $ is the chance of obtaining a 
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head, and Р, = 3 is the chance of obtaining a tail. The experi- 
ments described in Chapter 1 were based on populations which 
are conceptual. For these reasons the conclusions drawn from 
an experiment are conclusions about a hypothetical population 
defined by the classes and by related proportions, rather than 
about an existing population. 

Random Observations. Statistical logic has as its goal the 
generalizing from a sample to a population. The degree of con- 
fidence which can be placed in such a generalization can be es- 
timated only if the observations may be considered as drawn at 
random from the population. Consequently, much thought must 
be given to methods of achieving randomness when a statistical 
investigation is planned. 

When a statistical study deals with observations on a finite 
population, say the population of a city, an element is considered 
to be drawn at random if, and only if, some device is used by 
means of which any one element of the population is as likely to 
be drawn as any other element. This situation never obtains 
by conscious choice. It is completely impossible for a person to 
pick a book at “at random” from a bookcase or even to pluck a 
blade of grass “аё random” if he looks at it. Some device designed 
to produce a random selection is essential. If the population is 
finite, each individual in it may be given a number, those num- 
bers may be written on little tags, the tags placed in'a large 
container and thoroughly stirred, and a tag drawn out by a blind- 
folded person, or shaken out by some mechanical device. That 
would be correct but laborious. An easier and equally correct 
procedure is to make use of a table of random numbers such as is 
described in Chapter 6. 

The criterion that every individual in the population must 
have the same chance to be included in the sample is necessary 
but is not sufficient to define a random sample. Suppose, for 
example, that one has 5000 cards of members of a union arranged 
alphabetically, and that a sample of 50 is needed. Suppose the 
method for sampling is to select an individual card at random 
and then take every 10th card in succession after it until 50 cards 
have been drawn. Any individual has the same chance to be 
chosen as any other but this is not a random sample. It is also 
necessary that every set of 50 must have the same chance to be 
chosen as any other set of 50 and there are many sets of 50 cards 
which could not possibly be obtained by this procedure. A better 
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way would be to go on drawing individuals at random until 50 
have been chosen. Strictly speaking, each card should be returned 
to the file after it has been drawn so that it could possibly be 
drawn more than once, but that is seldom done and failure to 
replace the individual after drawing has negligible effect on the 
outcome unless the population is small. 

In an experimental study it is not possible to enumerate all 
elements and to sample them randomly in the manner just de- 
scribed. In such a study it is necessary to plan the experiment 
80 as to provide a fair representation of the population. To take 
a simple example, consider an experiment to test the hypothesis 
that a given coin is unbiased, that in a fair toss it is as likely to 
fall head up as to fall tail up. An experiment requires that a 
number of fair tosses be made. To insure fairness several persons 
may be asked to make tosses. The several samples of tosses may 
then be compared, or combined. 

If the problem is extended to a test of whether all 25-cent 
coins are unbiased, evidence from one coin would not be sufficient. 
Samples from a number of sources would be selected and each coin 
subjected to tosses by one or more individuals. 

The experiments described in Chapter 1 must also be ap- 
proached from this point of view. The photographs must be 
selected so as to provide a fair representation of the material which 
the subject may normally be expected to compare. Also the time 
chosen for the experiment must be fair to the subject. It is ob- 
viously impossible to choose these elements randomly from all 
possible elements. However, random devices can help in assuring 
fairness. 

In the experiment in which photographs are to be arranged in 
the order of intelligence of the persons photographed, the photo- 
graphs may be chosen randomly from a larger stock of available 
photographs. In this way the bias of the experimenter is elimi- 
nated. If possible the time, or repetitions of time, of the experi- 
ment, may be randomized. 

In the experiment in which pairs of photographs are compared, 
it is desirable not only to choose them at random from a larger set 
of pairs, but also to randomize the order in which the pairs are 
presented. In addition each pair should be shuffled so that the 
photograph representing the higher intelligence is as often on the 
right as it is on the left. The composition of the pairs may itself 
be determined by a random process. 
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In each experiment the element of randomization must be 
considered to insure a fair representation of the population which 
is studied. In an experiment comparing two teaching methods 
both methods may be tried by several teachers in each of several 
communities. Ideally the communities should be chosen at ran- 
dom. In addition, within each community, half the available 
teachers should be assigned at random to teach by one of the 
methods and half by the other. If such precautions are followed, 
the probability laws discussed in this book are applicable to the 
experiment. 

Probability. Suppose that in a population each observation 
falls in one and only one of the classes Cı, C; . . . Cy; and sup- 
pose the proportion of observations in these classes to be Pi, 
Pi... Py. If an observation is drawn at random from the entire 
population, P; is the probability that it will come from class Cı; 
the probability that it will come from C; is P2;, and, in general, 
the probability that it will come from C; is P;. In Table 1.1 on 
page 4 there are five classes (or categories) defined in terms of 
the number of photographs correctly placed. These classes may 
be designated Co, Cı, Cs, Cs, and C,, and the corresponding pro- 
portions are zî, 2%, zr, 23 and Jy. These proportions are properly 
called probabilities. Similarly Table 1.2 has six classes and the 
corresponding proportions are called probabilities. 

In the situations considered in Chapter 1 the student could 
compute the probability that a random observation would fall in 
а particular class, because he could enumerate all possible forms 
that an observation could take. Later on we shall deal with prob- 
lems for which the appropriate probability must be read from a 
table prepared by a mathematician. Before that stage is reached, 
it is important for the student to develop clear concepts about the 
meaning of probability in these simpler situations. 

Probability Distribution. A distribution showing a set of de- 
fined classes Cı, Cs . . . C, and the probability that a-random 
observation X will fall in each of those classes is called a probability 
distribution. Thus the tables in Chapter 1 display probability 
distributions. 

‘Sometimes this random observation is a group character com- 
putéd from a sample, such as a mean, or a standard deviation, or 
& per cent, or a-coefficient of correlation, etc. A group character 
obtained from an observed sample is called a statistic. A proba- 
bility distribution showing the probability that a statistic from а 
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random sample will fall in each of a set of defined classes is called 
a sampling distribution. In later chapters we shall speak of the 
sampling distribution of the mean, the sampling distribution of a 
per cent, and the like. 

Symbolism. A shorthand manner of writing probability state- 
ments is convenient because it reduces the amount of writing to 
be done and enables the eye to take in at a glance an idea which 
would be grasped more slowly if expressed entirely in words. The 
sentence “Р; is the probability that a random observation X will 
fall in the class C;" would be written symbolically as 

P(X isin C) - P; 
or Pr(X isin C) = P; 
Sometimes curled braces{ } are used instead of round parentheses. 


The expression inside the parentheses may be varied to suit the 
occasion. Thus for the data of Table 1.1 one might write 


Р { 4 correct placements } = .042 
Р (3 correct placements } = .250, etc. 
and for the data of Table 1.2 on page 7 one might write 


Pr(all choices correct) = .0312 
Pr(3 choices correct) = .3125, etc. 


The probability that a head will appear on a single toss of a coin 
might be written merely P(head) or P(H). Р, is, also for brevity, 
called the probability of the class C;. 


Symbolic Statements of Probability 


Words Symbol 
The probability that an observation X will P(X is in C) 
fall in Class C 
The probability that an observation X will P(X is in Ci or О) 


fall either in Class C; or in Class Cs 

The probability that if a first observation P(X: is in C; | X; is in C;) 
has been found to be in Class C;, & second 

will be found to be in Class C; 


Sometimes it is desirable to include in the probability sym- 
bolism a conditional clause, such as ‘‘the probability of four cor- 
rect choices when the number of choices to be made is 5," or 
“the probability of placing exactly 2 photographs in correct 
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position if the number of photographs to be arranged is 4,” or 
“the probability that a second observation falls in class j when а 
first observation has already fallen in class 7.” For this purpose 
a vertical line is used to mean “when ” or “if” and the conditional 
clause is written after the line. The three sentences just quoted 
might be written as follows: 


P(4 choices correct | N = 5) 
P(2 correct | N = 4) 
Р(Х, is in C; | X; is in С) 


Mutually Exclusive Classes. In a system of classification, if 
it is impossible for the same observation to be placed in each of 
two classes simultaneously, those classes are said to be incom- 
patible or mutually exclusive. Thus, if children are classified by 
age, the class of 5-year olds and the class of 6-year olds are mu- 
tually exclusive. If children are classified by sex the class of boys 
and the class of girls are mutually exclusive. However, the class 
of boys and the class of 6-year olds are not mutually exclusive 
because an individual may belong in both classes simultaneously. 

Exhaustive Classes. If in a system of classification every 
observation must be in at least one of the classes named, then 
those classes are exhaustive. Thus in Table 1.1 every possible 
selection must fall in one of the five classes named, no other 
classes are possible in this experiment, so these five classés are 
exhaustive as well as mutually exclusive. 

In a group of children aged 4 to 10, the class of 5-year olds 
and the class of 6-year ‘olds are mutually exclusive but not ex- 
haustive; the class of age.4 to 7 and the class of age 6 to 10 are 
exhaustive but not mutually exclusive; the class of boys and the 
class of girls are both exhaustive and mutually exclusive. 

Tf classes are both exhaustive and mutually exclusive the sum 
of their probabilities is 1, as in Tables 1.1 and 1.2. 

Independence of Observations. Independence of observations 
is an important idea which will occur again and again throughout 
this book. Two observations are considered to be independent when 
information about one of them provides no clue whatever as to the 
other. Two observations are considered to be dependent if it is 
possible to make a better guess about one of them when you know 
what the other one is. 

An illustration will clarify this concept. Suppose a housing 
survey of a community is to be made by means of a sample in 
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order to determine the extent of overcrowding, and suppose a list 
of all dwelling units in the community is available. Suppose 
further that by some random procedure & dwelling is selected, 
the interviewer visits it, and finds that it is à one-family house 
of 10 rooms occupied by 3 people. Now let a second dwelling 
unit be selected at random. Does the nature of the first observa- 
tion affect in any way the probability that the second observation 
will be a substandard dwelling? A superior housing unit? If 
selection is made at random, the probability of obtaining a par- 
ticular type of home is the same at every selection regardless of 
what was obtained on an earlier observation. In such a situation 
observations are independent. 

Now by contrast, let us assume that in order to save time and 
travel, an interviewer who has been asked to visit 5 homes takes 
all of them in the same block. If one of these homes is the 10- 
room house with 3 occupants mentioned in the preceding para- 
graph, the probability that any of the others will be a cold-water 
flat of 3 rooms occupied by 5 persons is very much smaller than 
the probability that a dwelling drawn at random from the entire 
list would be such a cold-water flat. These 5 observations made 
on homes in one block are not independent. 

Observations which are independent with respect to one popu- 
lation may be dependent with respect to another. If the popula- 
tion Being studied is the families living in Van Buren County, 
Towa, a random sample of families from that county provides a 
set of independent observations. However, if the population 
being studied is the families in rural Iowa, then a random sample 
of the families in Van Buren County provides a set of dependent 
observations. 


Laws for the Combination of Probabilities 


Addition Law: If two or more classes are mutually exclusive, the proba- 
bility that an observation will be in one or the other is the sum of the 
probabilities of the separate classes. 

Multiplication Law: If X; and X, are independent observations, the 
joint probability that Ху will be in C; and X, will be in С, is the product 

of their separate probabilities. 


Several observations on one individual may be considered in- 
dependent if that individual alone is being studied. For example, 
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one might be interested in variations of his blood pressure and 
might make observations at varying times. In this case, general- 
ization would be to the blood pressure of this particular person. 
If a population of persons is being studied, several measurements 
of the blood pressure of the same person on different occasions 
would be dependent. 

The Probability of Two or More Occurrences. The impor- 
tance of the ideas of dependence and independence among observa- 
tions is due to the probability relations in the possible outcomes 
of the observations. 

Consider again the survey of a community with the purpose of 
determining overcrowding in homes. Suppose that a standard 
of eomfort in the home has been adopted and that by this standard 
one third of the families in the community live in overcrowded 
conditions. Suppose also, that the community is fairly large. 
Then, if two families are observed independently, the probability 
that both live in overcrowded conditions is $ x 4 = Ф. In a sample 
of independent observations, the probability that all or even a 
considerable proportion of families selected independently for ob- 
servation live in overcrowded conditions is very small. The 
probability is high that in such a sample the proportion of families 
living in overcrowded homes will not differ greatly from 4. 

If, however, one observation is made by choosing a family at 
random and then a neighbor of this family is questioned to pro- 
vide a second observation, the probability that the second family 
will live in an overcrowded home is determined by the home con- 
ditions of the first family. If the first family lives in an over- 
crowded home then the chance that their neighbor will also live 
in such a home is likely to be far in excess of 4. The joint proba- 
bility that both families live in overcrowded homes is then a 
number far greater than $. If an entire sample is chosen from the 
same neighborhood the proportion of overcrowded homes in the 
sample is likely to differ greatly from the corresponding propor- 
tion in the population. 

Consider a very extreme case in which one family is chosen at 
random and then all the other families in the sample are taken 
from the immediate neighborhood of the first one. The survey 
might very well indicate that all families in the community lived 
in substandard houses, or that all lived in affluence. It could not 
yield information of much value. 

It should be recognized that in some situations it is an excellent 
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procedure to choose many neighborhoods (say m) at random and 
then to make observations on a predetermined number of indi- 
viduals (say п) in each neighborhood. However, the results 
cannot be treated as a random sample of mn individuals, but must 
be treated by special methods. Moreover, the number of randomly 
selected neighborhoods must not be too small, 


EXERCISE 2.1 

1. Let X represent the result of throwing a die. All possible results 
may be grouped in six classes, each class defined by the number appearing 
on the exposed face of the die. Thus to say X = 5 із to state that the result 
falls in the class characterized by having the exposed number a 5. As 
these six classes are exhaüstive and mutually exclusive, the sum of the 
probabilities associated with them is 1. If the die is perfectly balanced, 
all classes have the same probability and so the probability of any one 
class (that is the probability of the appearance of any particular face) 
is $. Translate the following statements into symbols. Note: Slight 
variations from the form given in the answer key may be acceptable. 

a. The probability of obtaining а 5 is 4. Ans. P(X = 5) = t 

b. The probability of obtaining a 1 is the same as the probability of 

obtaining a 2 ora3...oraé. 

с. The probability of obtaining a 1 ога2154+4= + 

d. The probability of obtaining an odd number is 3 + $-- $= 1 

е. The probability of obtaining a number smaller than 4 is 4. Note 


these definitions: 
X>A X is greater than A 
X<A X is less than A 
X2A X is greater than or equal to A, that is 
X is not less than A 
XSA X is less than or equal to A, that is 


X is not greater than A 
Ans. P(X < 4) = P(X = 1 or 2 or 3) 
= P(X =1) + P(X =2) + P(X = 3) 
=tie+t=3 

f. The probability of not obtaining a 6 is 1 minus the probability of 

obtaining a 6, and this is 1 — ‡ = $. 

2. Two perfectly balanced dice are thrown. Let X; represent the 
number exposed on the first and X, the number exposed on the second. 
Translate the following symbols into words. 

а. Р(Х. = 3 and X; = 3) = P(X; = 3) - P(X, = 3) =4-4=% 

b. Р(Х, + X: = 2) = Р(Х, = 1) - P(X = 1) =} t= 

с. Р(Х. + X: = 3) = Р(Х, = 1 and X; = 2) + Р(Х, = Z and X; = 1) 


-tithd-dow 
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d. P(X,=5|Xi=6) = 
е. Р(Х, =6|X,=6) =4 
f. P(X, and X, are both odd) = P(X, odd) - Р(Х; odd) 
-P(Xi-lor3or5).P(X;-1or3or5) = (+4 - DG ++ +4) 
=4-$=4 
g. Р(Х, and X; both < 6) = P(X, < 6)- P(X, < 6) 
= )1- 8(1- 4) = چ0 چ‎ = 3 
h. P(Xı + X: = Ш) = P(X, = 6 and Ху = 5) + P(X, = 5 and X, = 6) 
= P(X; = 6) - P(X; = 5) + Р(Х, = 5). Р(Х» = 6) 
=t i= 


Factorial. It is convenient to have a term and a symbol to 
represent the product of consecutive integers beginning with 1. 
Thus 1۰2۰3۰4۰5۰6۰7 = 5040 is called factorial 7 or 7 factorial. 
It is denoted by the symbol 7! In general if N is a positive integer, 


(2.1) Nl = N(N =1)(N-2)- - +4-3-2-1 


Ы is convenient to define factorial zero as equal to 1. This defini- 
tion simplifies notation and is consistent with other notation of 
factorials. 


The symbol (7) is used to represent a relation which occurs 
often in work with probability, especially in developing the bi- 
nomial distribution, below. It is defined as 


ee (7) = zar 


This relation is of interest because it gives the number of ways in 
which r objects can be selected from a group of N objects. 


EXERCISE 2.2 
1. Verify the following relations: 
а. 4!=4.8.2.1=24 g. 8! = 8(7!) 
b. 51=5-4-3-2-1 = 120 h. 20! = 20(19) (18!) 
5!_ 54.82.1 _ ‚” agit 
Er a ar as al hig 880 
aiy j. 91. 9876) 
5 aE k. ri = 9(8)(7) 
` 10! 90 10! 
L 5-210 
"e Ma 6141 
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2. Find the values of the following expressions: 


, 9 _ 191 е. 013! 5171 

vigi * 20! g 16:15:04) Е. бм! 
15! 10! : 181 1016141 

b. 731 4. om Be 3112! 


3. Verify the following: 


a. (5) = 56 d. (8 = Е. (3) -1 
() = 126 e. is = 220 h. (a = 220 


с. (*) =20 ГА (5) = 2300 i ( x = 4368 
3 5, 


A Population of Two Classes. Perhaps the simplest popula- 
tion which can be considered is one which consists of only two, 
classes. Such a population was described in Chapter 1 in the 
experiment to judge the ability to select the more intelligent ‘of 
two persons by comparing their photographs. The population 
considered there consists of two classes: 1. the class of correct 
selections, and 2. the class of incorrect selections. A population or 
sample divided into two classes is often termed a dichotomy. 

The hypothesis formulated in Chapter 1 that a correct selec- 
tion was as likely to occur as an incorrect may be formulated in, 
the language of this present chapter by saying that the propor- 
tion of elements in each class is one half. 

A different hypothesis as to the number of correct solutions 
would provide a different description of the population. Con- 
sider, for example, the hypothesis that a correct selection is four 
times as likely to occur as an incorrect selection. The class of 
correct selections would then contain $ of the elements, whereas 
the class of incorrect selections would contain 4 of the elements. 
The two populations described by these two hypotheses are dis- 
played graphically in Figures 2-1 and 2-2. The proportiou of 
correct selections may be any number between zero and one 
depending on the circumstances. The proportion of incorrect 
selections is one minus the proportion of correct selections. 

Populations consisting of two classes occur very often, for 
example: male and female, married and unmarried, passed and 
failed. Any population of two classes is fully determined by the 
proportion of elements in sither one of the classes. For con- 
venience in writing formulas the symbol P will be introduced to 


c 
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signify the proportiori of elements in one of the classes of a two- 
class population. The proportion in the other class is 1 — Р. For 
simplicity in writing it is customary to use the symbo! Q for 
1-P, 


Incorrect Correct Incorrect Correct 
Selections Selections Selections — Selections 
Fra. 2-1. Population distribu- Fig. 2-2. Population distribu- 
tion under the hypothesis that cor- tion under the hypothesis that cor- 
rect and incorrect selections are rect selections are four times as 
equally likely. likely as incorrect, 


While P may vary from population to population it is a fixed 
number for any one population. The proportion in a sample 
should be distinguished from that in the population. Thus if a 
population of persons contains 50% men, a sample, even though 
random, might contain 40%, or 70%, or even 100%, men. If the 
sample is random, and not very small, the probability of extreme 
deviation from 50% men is small. 

To distinguish between the proportion in the population, which 
we have called P and the proportion in a sample, we shall use a 
lower case p to denote the latter. 

Parameter and Statistic. A quantity like P which distin- 
guishes one population from another similar population is called 
a parameter. The reader should note that P stands for proportion 
and nof for parameter. Other parameters such as the mean and 
Standard deviation will be introduced later. Many populations 
involve more than one parameter. The sample proportion р is 
called a statistic. It varies from sample to sample of any given 
population. Thus for random samples out of a population with 
parameter P, the sample statistic p fluctuates around P. 
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The Binomial Distribution. The binomial distribution is the 
sampling distribution of p for random samples out of a two-class 
population. This statement means that the terms of the. bi- 
nomial distribution provide the probability for each possible 
value that p may take in a random sample of N elements. 

To make the discussion concrete, let us return to the experi- 
ment for selecting the more intelligent of two persons by a com- 
parison of their photographs. Suppose that in the population the 
probability of a correct solution is P and hence the probability 
of an incorrect solution is Q. Suppose a random sample of two 
pairs of photographs is taken. Since the two pairs are independent 
observations, the multiplication law for combining probabilities 
applies. The four possible outcomes and the probability asso- 
ciated with each are: ; 


Outcome First selection Second selection Probability 
1 Correct Correct P x P = P? 
2 Correct Incorrect PXQ=PQ 
3 Incorrect Correct QxP-PQ 
4 Incorrect Incorrect QxQ2Q 


These four outcomes might be described in terms of the numbers 
of correct selections, which are respectively 2, 1, 1, and 0. They 
might also be described in terms of the proportions of selections 
made correctly, which are respectively 2 = 1.00, $ = .50, + = .50, and 
$ = 0. In either case the two middle outcomes may be grouped 
together, as they yield the same number of correct selections. 
The sampling distribution for samples of 2 cases is then: 


Np р Probability 
Nowe oh Proportion of 
correct lecti 
MS correct selections 
2: 1.00 P 
1 .50 2PQ 
0 1 .00 Q 
The sum of the probabilities is Q? + 2PQ + P? which is AEN 
(0 + Р)? = (1)?=1. jis ASIC 12^ 


For samples of any number of pairs the sampling distribution Dayan, 
be developed in similar fashion, but the task is avery tedious one ae \ 
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shows probabilities for samples of 5 pairs. Generalization from 
5 to other values of N can be facilitated by writing the numerical | 


coefficients in the (9 notation, and that has been done in the | 


right-hand column. 


Np p Probability 
Number of correct Proportion of correct 
selections in selections in samples 
samples of 5 of ó 
x 
5 1.00 Р» = () ps 
4 .80 БОР! = (a) QP 
3 60 100°Р' = (3) P 
2 40 10Q:P: = (3) QP? 
1 20 5Q'P = 1 QP 
0 00 Ф = (0) @ 


The sum of the six probabilities is (Q + P) = (1): = 1. Here 
we have.shown the binomial sampling distribution for samples of 5. 
To generalize, call the elements in one of the two population 
classes “successes” and let P be the probability of a success. 
Then for a random sample of N cases the probability of exactly 
T successes is 


(2.3) (7) Pv 


This generalization leads to the sampling distribution of p for 
samples of N cases from a two-class population, which is given in 
' Table 2.1. 

The probabilities in the informal table previously given for 
samples of 5, and the probabilities in Table 2.1 for samples of any 
size apply either to the proportion of successes or to the number 
of successes. 

The fact that the sum of probabilities in the right column is 
(Q + P)" is of interest since it illustrates the fact that the bi- 
nomial distribution is a special case of the binomial expansion 
which is treated in textbooks on algebra. Because Q + P is. always 
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ТАЗЕ 2.1 The Binomial Distribution 


Sampling distribution for p from samples of № 
cases from a two-class population 


Np Р Probability 
Number of 1 
successes 
N N (x) >" 
N-1 Sy (y) P7 
vM M (y y) Pre 
| ; ()re- 
; ; (re^ 
pen ert 
E Ж. Qe 
Toran QF Р) =1 


equal to 1, (Q + Р)” = 1 no matter what value N has. Inasmuch 
as the classes defined by the number of successes are mutually 
exclusive and exhaustive, the sum of their probabilities must 
be 1.00. 

The Graphic Representation of the Binomial Distribution. 
Figures 2-1 and 2-2 on page 20 showed two populations in which 
P = .5 and P = .8 respectively. Suppose a sample of 5-cases is 
drawn from each of these populations. The number of successes 
which can be obtained in that sample can only be one of the in- 
tegers, 0, 1, 2, 3, 4, or 5 and so the proportion of successes can 
only be 2, 4, 2, 3, $, or $, as shown in Figure 2-3. These values 


9--------e-------- ------- هھ‎ ә 
0 e 4 6 8 10 


Fic. 2-3. Possible values of p in samples of 5 cases. 
are represented as discrete points on a linear scale. The variable 


exists only at these points. The probability associated with each 
value of p for samples of 5 from a population in which P = .5 
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is given numerically in column 3 of Table 2.2 and shown by a 
vertical line in Figure 2-4. These probabilities are the respective 
terms of (.5 + .5)*. The corresponding probabilities when Р ~ .8, 
obtained from the terms of (.2 + .8)5, are given numerically in 
column 5 of Table 2.2 and shown graphically in Figure 2-5. 


Probabili 
не. IY 
9 :9[-- 
.8 8 
‚7 7 
6 6 
5 5 
4 4 
3 3 
2 2 
1 1 
9 0:2 16' .8 10 9 Оза 5459.6) TE 10 
Scale of p Scale of p 


Fie. 2-4. Distribution of p in 
samples of 5 from a two-class popula- 
tion in which P = .5, p being repre- 
sented as a discrete variable. 


Even in elementary work in 


Fic. 2-5. Distribution of p in 
samples of 5 from a two-class popula- 
tion in which Р = .8, p being repre- 
sented as a discrete variable. 


statisties it often is found con- 


venient to treat a discrete variable as though it were continuous. 
Thus the mean number of children in a group of families might 
be reported as X — 1.53, or the 90th percentile reported as 3.87. 


TABLE 2.2 Probability Associated with each Possible Value of p in Samples 
of 5 when Р = .5 and when Р = .8, and Corresponding Cumulative Probability 
АЛЕНЕ ЕЕ IM GEN C LL PS LET eee 


Р = 5 P=8 
Value Value - —— 
хе Cumulative A Cumulative 
of Np of p Probability probability Probability probability 
1 2 3 4 5. 6 
5 18 03125 1.00000 32768 1.00000 
4 8 ‚15625 96875 -40960 67282 
3 6 31250 :81250 -20480 26272 
2 4 31250 .50000 05120 05792 
1 2 15625 .18750 .00640 .00672 
0 0 03125 03125 .00032 .00032 
1.00000 1.00000 


——M——— 
These numbers are obviously artifacts but they are meaningful 
artifacts. No family has 1.53 children but the statistic gives useful 
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information about the families under consideration. The distri- 
bution of the number of children per family might be represented 
graphically as a histogram where the numbers 0, 1, 2, . . . in- 
dicating number of children per family are represented not as 
discrete points on:a line but as midpoints of intervals. In such 
a figure the number of families with 2 children would be represented 
by the area of a rectangle with base extending from 1.5 to 2.5 on 
the horizontal axis and with height proportional to the number of 
families having 2 children. The number 0 on the horizontal scale 
would be represented by an interval extending from — .5 to + .5 
even though no families could have a negative number of children. 
All intervals being of the same width the areas of the rectangles 
which form the histogram are proportional to their heights and 
to the frequencies represented. 

The binomial probability distribution may also be represented 
by a histogram which, although an artifact, brings out certain 
general concepts better than the discrete distribution. Figures 
2-6 and 2—7 are such histograms superimposed on the graphs of 
the discrete distributions of Figures 2-4 and 2-5. In these figures 
probability is represented by area. The ordinate is then called 
the probability density. 


Probability Probability 
Density. Density 
1.0)- 1.0,-7-— 


B9 RuOuoi 
н оо љом Dio 


ОЕА ЕБ ВЕ 1.0 0- G24 76/82 4:0 
Scale of p Scale of p 


Fie. 2-6. Distribution of p in Fie. 2-7. Distribution of p in 
samples of 5 from a two-class popu- samples of 5 from a two-class popula- 
lation in which Р = ‚5, p being rep- tion in which Р = .8, p being repre- 
resented as a continuous variable. sented as a continuous variable. 


Suppose we wish to represent the probability that p = A or 
less when P =.5. By the Addition Law stated on page 15 this is 


P(p = 0) + P(p = .2) + P(p = 4) = 03125 + .15625 + 31250 = .50 
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as indicated by the cumulative probability in column 4 of Table 
2.2. This probability is the sum of the three vertical lines at 
р = 0, p=.2, and p = .4 in Figure 2-4 but that sum is not par- 
ticularly easy to visualize. The same probability is represented 
by the area to the left of p = .45 in Figure 2-6. This area is easily 
visualized but its numerical value cannot. be directly read from 


97.2904 eO BF 
Scale of p 


1.0 


Fic. 2-8. Cumulative probability 


Probability Probability 
TO mr eme L0r7--— 
эре 9 
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1.0 


Fig. 2-9, Cumulative probability 


distribution of р in samples of 5 from distribution of p in samples of 5 from 
a two-class population in which а two-class population in which Р = .8. 
Р = .5. 


the scale. Figure 2-8 represents the same cumulative probability 
by the length of the vertical line at p = .4. This is readily visu- 
alized and its value can be read directly from the scale. The 
cumulative probability curve has great advantages and will be 
used frequently throughout this book. The student should study 
(1) the way in which Figures 2-8 and 2-9 are plotted from the 
cumulative probabilities in Table 2.2, (2) the relation of Figures 
2-8 and 2-4; (3) the relation of Figures 2-9 and 2-5 and (4) the 
difference between Figures 2-8 and 2-9. 


EXERCISE 2.3 

1. If samples of 4 cases each are drawn from a very large population 
in which 30% of the individuals have a certain trait, what is the sampling 
distribution of the number having that trait? Make a graph of the popula- 
tion distribution. Make a graph of the sampling distribution. 

2a. If samples of 3 cases each are drawn from a very lafge population 
in which P — .50, what is the sampling distribution of p? Make a graph ot 
the population distribution; of the sampling distribution. 

b. Do the same for samples of 6 cases. 

с. Do the same for samples of 10 cases. 
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3. In drawing samples from a symmetric population in which Р =Q 
= .50, what appears to be the effect of increasing the size of the samples? 
Base your answer on the results obtained in question 2. 

4a. If samples of 3 cases each are obtained from a very large population 
in which P — .60, what is the sampling distribution of p? Make a graph of 
the population distribution; of the sampling distribution. 

b. Do the same for samples of 6 cases. 

c. Do the same for samples of 10 cases. 

5. Answer the questions in item 4 for the case in which P = .2. 


Mean of the Probability Distribution for a Discrete Variable. 
The familiar procedure of computing a mean will now be applied 
to a probability distribution in order to develop certain important 
new coneepts. In Table 2.3, X is the discrete variable “ propor- 
tion of successes in a sample of 5 cases.” Instead of the frequencies 
you are accustomed to using in such computations, we here use 
the probabilities from column 3 of Table 2.2. The mean thus 
computed is found to be .5 which was the value of the population 
proportion. This mean is called the expected value of p or the ex- 
pectation of p and is denoted by the symbol E(p), read “E of p” 
or “expectation of р.” 


TABLE 2.3 Computation of the Mean and Variance of p in Samples 
of 5 from a Population in which P = .5 


et 
X = Value Probability (Probability) X (Probability) X* 
of p [Analogous to f] [Analogous to fX] [Analogous to fX*] 
1.0 03125 03125 03125 3 
3 15625 12500 .10000 
6 .31250 .18750 - .11250 
E! .31250 .12500 05000. 
2 15625 03125 .00625 
0 03125 .00000 .00000 
Sum 1.00000 -50000 30000 
. Z(Probability)X _ Ка 
Mean = probabilities ^ 10 — 5 
А (Probability) X? pede б 
Variance = ые ш — (mean)? = 10^ (5) = 05 


Standard deviation = V.05 = .224 


As we shall wish to use the idea of expected value in reference 
to other statistics as well as р, let us now state it in more general 
terms. Let X denote the variable for which a probability dis- 
tribution is given, and let Pi, Р... P, be the probabilities of 
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the values Xi, Хз... Xz. Then the mean of the probability 
distribution of X is E(X) and is computed as 
ZP;X; 


since ZP; = 1. ZPiis an abbreviation for “the sum of the P;.” A 
fuller description of this symbol is given in Chapter 5. 


(2.4) E(X) = EP.X, 


The parenthesis around X must not Бе interpreted as indicating 
a product. The term expected value originated in games of chance. 
If in such a game there are k different possible outcomes with 4 
probabilities Pı, Р... P, and if the amount received by a 
player for each of these outcomes is Ху, Xs . . . X, (where some 
of the X’s may be negative if forfeits are involved) then in a long 
series of games E(X) as given by Formula (2.4) is the player's 
expected return per game. The small Greek letter и (pronounced 
mu) corresponding to the English letter m will be used as an 
abbreviation for the mean of a probability distribution. 


= ХР;Х; 


(2.5) их = Е(Х) 
For the probability distribution of р, 
(2.6) Hp = E(p) = P 


Variance and Standard Deviation of the Probability Distribu- 
tion for a Discrete Variable. In Table 2.3 the variance and stand- 
ard deviation of p have been computed in the manner presumably 
familiar to you with probabilities taking the placé of frequencies.* 

The computation might also have been carried out by sub- 
tracting P from each value of p in turn, squaring, multiplying 
by the probability and summing, thus: 


p-P (p= T): (p — Р)? (Probability) 

10—.5-2 .5 25 .0078125 
8:— б == ..8 .09 0140625 
6-.5= .1 01 0031250 
4—.5=—.1 01 0031250 
2—.5=—.8 .09 .0140625 
0—.5=—.5 .25 .0078125 
-0500000 


ү * Persons for whom this is a first course may find it helpful to study the computa- 
tion of mean, variance, and standard deviation in an elementary text, or to turn to 
page 115 of this text. 


Mean and Standard Deviation of Binomial - 29 


This is exactly the same result as obtained in Table 2.3. The 
variance of p will be called. the expected value of (p — и)? de- 
noted by the symbol e°, (с is the small Greek letter corresponding 
to s and is pronounced sigma). 


(2.7) o°, = E(p — up)? = E(p - Р)? 
In general if X is defined as before to be any variable for which 
the probability distribution is given, 
(2.8) E(X – u} = ZP(X; - n! = ХР;Х; — ш 
(2.9) epe E(X- y) 
The standard deviation ox is the square root of the variance c*x. 


EXERCISE 2.4 

For each of the following probability distributions compute и = E(X) 
and e? = E(X — uy. 

1. The distribution in Table 1.1 on page 4 (д = 1, c? = 1). 

2. The distribution of p in samples of 5 from a population in which 
P = .8, using the probabilities given in column 5 of Table 2.2. 

(и = .8, о? = .032.) 

3. The distribution of the number of successes (X = Np) in samples 
of 5 from a population for which Р = .5. Use the probabilities from 
Table 2.2. (и = 2.5, o? = 1.25.) 

4. The distribution of the number of successes (X = Np) in samples 
of 5 from a population for which P = .8. (и = 4,0? = 8.) 

Б. The distribution in Table 3.1 on page 45. (и = .5,0° = .025.) 


Mean and Standard Deviation of a Sample. The notation 
and formulas used in the preceding two sections are applicable 
to the calculation of means and standard deviations of theoretical 
distributions. Other notation and formulas will be used in the 
calculation of these characteristics in samples. A discussion of 
these is deferred to Chapter 5. 

Mean and Standard Deviation of the Binomial Distribution. 
In the preceding sections we dealt in general with the mean, 
variance, and standard deviation of any probability distribution. 
However when the probability distribution has the binomial form 
the computation can be greatly simplified. Let X represent the 
number of individuals having a given trait in a sample of N in- 
dividuals from a population in which P is the proportion having 
that trait. Then the mean is 


(2.10) их = E(X) = NP, 
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the variance is 


(2.11) ах = E(X – џ) = NPQ 
and the standard deviation is А 
(2.12) ox = VE(X —u) = УМРО 


The reader may gain understanding of these formulas by apply- 
ing them in problems 3 and 4 of Exercise 2.5 where they should 
give results identical with those obtained by the longer method 
of computation. x 
Let p represent the proportion of individuals having a given 
trait in a sample of N individuals from a population in which P 
is the proportion having that trait. The mean of the sampling 
distribution of p has already been given in Formula (2.6) as 


иу = E(p) = Р 
The variance is 


M _ РО 
(2.13) с" = E(p – Р)? = N 
and the standard deviation is 
Р 
(2.14) с, = VE(p - P} = Ve 


The reader may gain understanding of these formulas by applying 
them in problems 2 and 5 of Exercise 2.4. 


EXERCISE 2.5 

1. Suppose a class which is studying sampling distributions decides 
that each member will spin each of 10 pennies once and record the number 
of heads which appear. Assume the pennies to be relatively free from bias 
so that the probability of a head appearing on one particular spin of one 
penny is P = .5. 

a. What is the expected number of heads on each spin of 10 pennies? 

b. Will this expected number be affected by the number of repetitions 
of the experiment? Willit be related to the number of persons in the class? 

c. Will this expected number be changed if each person spins 20 instead 
of 10 pennies at one time? 

d. Suppose 500 students carry out the experiment and each records 
as an observation the-number-of-heads-on-one-throw-of- 10-pennies. Imag- 
ine the frequency distribution of these observations. An observation may 
have any value from 0 to 10. Can you state approximately the value of 
the variance of these observations? Why would this result be only an 
approximation? Would the approximation be better or worse if 10,000 
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students report observations than if 500 do so (assuming cf course that the 
10,000 observations are made with equal care)? 

е. If each student throws 20 pennies instead of 10 would that change 
he variance discussed in question 1d? 

2. Suppose 500 persons agree to take part in the experiment described 
1 the preceding question (or a smaller number of persons make repeated 

hrows until there are 500 records). 

a. What are the expected frequencies for the distribution of heads on 
she 500 throws? Let X = number of heads appearing on one throw. Then 
P(X;) is the probability obtained by taking the appropriate term in the 
xpansion of (.5 + .5)°. The expected frequency for a given X ; is 500P(X,) 
and is the appropriate term in the expansion of 500(.5 + .5)". Complete 
the entries in columns Р, FX, and FX*, , ° 

P(X) F FX FX: 
.00098 AQ 4.9 49 
.00977 4.88 43.9 395.1 
.04395 21.98 175.8 Ч 
11719 
.20508 
.24609 
.20508 
11719 
04895 
00977 
00098 
1.00008 500.03 


b. Compute the mean and the variance for the distribution in question 
2а. Compare these answers with the values for д and с: obtained by use 
of Formulas (2.10) and (2.12). 

с. Would the outcome of question 2a have been different if there had 
been 100 repetitions instead of 500? 

d. If the experiment were carried out as described could you be con- 
fident that the distribution of the number of heads in 500 replications 
would exactly conform to the frequency distribution of question 2a? Why? . 


Approximation to the Binomial Distribution. The computa- 
tions made and the graphs drawn in Exercise 2.3 suggest the fol- 
lowing generalizations: 

a. If Р = .5, the graph of the binomial distribution is sym- 
metrical regardless of the size of N. 

"b. If P is not .5 the graph of the binomial distribution is 
asymmetrical, or skewed, when N is small but becomes more 
nearly symmetrical as N increases. 


- 
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: €. As N increases, the steps іп the graph of the binomial dis- 
tribution become smaller and the graph takes on more the ap- 
pearance of a smooth curve. 

We may even go further and say that if P is not too near 0 
or 1 and if № is large, the smooth curve called the “normal curve” 
provides & very good approximation to the binomial. The sum 
of any number of terms of the binomial ean then be obtained 
approximately from the corresponding area under the normal 
curve. The larger the value of N the better the approximation is. 
In fact, the first treatise ever written about the normal curve 
appeared under the title Approzimatio ad Summam Terminorum 
Binomii a+6™ in Seriem, Expansi. Being translated this is 
"An Approximation to the Sum of the Terms of the Binomial 
(а +)" Expanded in Series.” In this seven-page pamphlet 
written in Latin and dated 1738, Abraham De Moivre wrote 
“although the solution of problems of chance often require that 
several terms of the binomial (a + b)" be added together, never- 
theless in very high powers the thing appears so laborious, and of 
so great difficulty that few people have undertaken the task." 
Then he proceeds to obtain the formula for the ordinate of the 
probability curve and for the proportion of area lying between 
ordinates at selected points. 

The Normal Distribution. The normal probability distribu- 
tion in the adjacent sketch is a probability distribution orf a con- 
tinuous variable. The scale for values of this variable is located 
on the base line. For a continuous probability distribution one 

does not speak of a probability cor- 


responding to a particular value of 

the variable, but rather of tbe 

probability that a value of the 

A OB variable is less than a given num- 

ber or more than a given number, 

or that it occurs between two given numbers. The probability 

is then an area under the curve. The probability that a value 

of the variable is less than A is the area under the curve, above 

the base line, and to the left of the ordinate at A. The probability 

that a value of the variable is between A and B is the area under 

the curve, above the base line, and between the ordinates at A 

and B. Since the normal distribution is à probability distribution 
the total area under the curve is one. 

For the normal distribution the point O, shown in the sketch, 
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is at the same time the mean, median, and mode. It is most often 
referred to as the mean. For convenience in reading tables re- 
lated to the normal curve, the mean of the distribution is taken 
as the origin from which values of the variable are measured. 
Seale values are most conveniently expressed in standard devia- 
tion units. When the origin of a normal probability distribution 
is placed at the mean (u) and values are sealed in terms of the 
standard deviation (е), the distribution is called unit normal. 
The unit normal distribution has mean zero and variance 1. In 
this book the letter 2 will be used to denote a value of a variable 
so distributed. 

If X is any normally distributed variable with mean и and 
standard deviation c, then 


(2.15) z= 


It is convenient to attach a subscript to the letter z to indicate 
what proportion of the area under the normal curve lies to the 
left of an ordinate drawn at 2. In 
the adjacent sketch 30% of the Co 
area lies to the left of an ordinate 30% 1 
at z» and 90% to the left of an = z o: f 
ordinate at 2.9; therefore 2 is the х i 
30th percentile of the curve and 2. is the 90th percentile. 

Reading a Table of Normal Probability. Tables I, II and III 
in the Appendix may be used for obtaining the probabilities of 
the normal distribution. ® 

In Table I are shown area values corresponding to given 
abscissa values ranging from — 4 to +4. Points to the left of 
the origin have negative abscissas and these are listed in the first 
column under the heading 2.. Then о denotes the area to the left 
of an ordinate at Za Points to the right of the origin have positive 
abscissas and these are listed in the second column under the 
heading #:-«. The area to the left of an ordinate at zi. is 1 — a 
and the area to the right is a. Thus 2, = —zi-.. The aréa out- 
side the ordinates at 2, and z;., is 2a, the area between these two 
ordinates is 1 – 2a. Thus the table indicates that the area to 
the left of an ordinate at z = — .9 is .200. This means that the 
20th percentile of the normal curve is — .9. This table is conven- 
ient to use when a value of z is known and the corresponding 
probability value is required. 
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Verify the following statements by identifying the appropriate 
entry in Table I. 


Р(2< — 2.5) = .006 P(z« — 2.1) = .018 
Р(2< 2.5) = 904 P< 2.1) = .982 
Р(2< – .4) = .345 Р(2< — 2.6) = .005 
Р(@< 4) = .655 Р(2< 2.6) = .995 


The percentile rank of 2 = ~ 2.5 is .6 
The percentile rank of 2 = .4 is 65.5 


2 = 2.1 is the 98th percentile. 
z = — 2.1 is the 2d percentile. 


In Table II the roles of the two columns in Table I are reversed, 
making Table II particularly convenient to use when a probability 
value is known and the corresponding value of z is required. 
Thus, if we want to know the 30th percentile of the unit normal 
curve we want a value of z such that 30% of the area lies to the 
left of the ordinate at that point, and this is read directly from 
the table as Z.» = —.524. If we had tried to obtain this same 
value from Table I we could only have noted that .30 which is 
the given area is between the tabulated values .308 and .274 and 
that therefore the required value of z is between — .5 and — .6. 
Interpolation is not linear in these tables and could not be relied 
upon for more than one additional digit. It is much less laborious 
and more accurate to use Table II than to try to interpolate in 
Table I and vice versa. Verify the following statements by iden- 
tifying the appropriate entry in Table II, then look at Table I 
to see how nearly you could have estimated the same value 
from it. 

For the unit normal curve 


the 10th percentile is — 1.282 
the 44th percentile is — .151 
the 50th percentile is 0 

the 95th percentile is 1.645 


Table III is set up in a fashion similar to that of Table I but 
with these modifications. The z column has been headed г/т 
and entries are at intervals of .01 instead of .1. It is to be under- 
stood that z/c has here the same meaning as z, in Table I. Areas 
are expressed as areas between the ordinate at z and the ordinate 
at the mean instead of areas to the left of the ordinate at z. 
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Thus for 2 = — 1.20 Table I would give the area as .115 and Table 
III would give it as .5 – .3849 = .1151. For 2 = — .65, inter- 
polation in Table I would give the area as $(.274 + .242) = .258 
while Table III would give .5 — .2422 = .2578. To estimate the 
same value from Table II we could note that the tabulator entry 
nearest to — .65 is — .6745 and the corresponding area is .25. 

The reader should be warned that practice in texts is not uni- 
form as to what areas of the normal curve are tabulated, and so 
a full understanding of the relations involved is necessary if one 
is to make correct use of whatever tables are at hand. In most 
tables the maximum ordinate is given as .3989 as in Table III, 
but in some tables the maximym ordinate is given as 1.00 and 
all other ordinates are expressed as proportions of the maximum. 
Such a table is a table of the normal curve but not of the unit 
normal. 

It is important that the reader understand certain relation- 
ships, which will be briefly stated here. Exercise .2.6 provides 
practice in recognizing these. 

The subscript of z denotes an area while z denotes a distance 
on the baseline measured from the mean, or it denotes a point on 
the baseline at distance z from the mean. 


If the subscript is less than .50, z is negative. 

«If the subscript is exactly .50, 2 is zero. 

If the subscript is greater than .50, 2 is positive. 
Y 


In subsequent chapters the small Greek letter o (alpha) will 
often be used to designate a small area in one of the tails of a 
probability eurve, as in the adjacent 
Sketch. Then 

Za = — 1-а 
= Za = Zi-a . ES N 
Za + а-а = 9 Za Zia 


The reader must be warned that the numerical subscript is 
used with a somewhat different meaning in móst other texts. In 
later chapters we shall deal with the probability distributions of 
several other variables, in particular with variables called t, x?, 
and F. In the symbolism used in this text (which is the same as 
that used by Dixon and Massey in An Introduction to Statistical 
Analysis, McGraw-Hill, 1951), t.o, x?.os and Р. each means the 
fifth percentile of the corresponding probability distribution. In 
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the symbolism commonly used in other tèxts, tos is the 97.5 per- 
centile, х? and Fos are the 95th percentiles, while the 5th per- 
centiles of those distributions are denoted as — £o, Xs and Fo. 
Obviously the notation introduced by Dixon and Massey and 
followed in this text is much more consistent and less confusing. 


EXERCISE 2.6 

1. By reference to Tables I, II, or III verify each of the following 
Statements and indicate the number of the table which can be most con- 
veniently used. 


a. Р( > 2.05) = .5 — .4798 = .0202 (Ш) 
b. Р@ < — 2) = 421 (1) 
c. Р(0 < г < 1.32) = 4066 (III) 
d. Р(— .64 < z < 0) = .2389 (Ш) 
е. Р(г < .5) = .692 (I) 
f. P(z < .52) = .5 + .1985 = .6985 (Ш) 
в. P( > .57) 2.5 — 2157 = .2843 (Ш) 
h. P(-.3 < z < .3) = 1 — 2(.382) = .236 (1) 
= 2(.1179) = .2358 (III) 

i P(e < —.30rz > .3) = 2(382) = .764 (I) 
= 1 — 2(.1179) = .7642 (Шш) 

j. P(e < — 2.5 ог z > 2.5) = 2(,006) = .012 @) 
= 1 — 2(.4938) = .0124 (Ш) 


. 2% = — 1.645 (II) 
298 = — 2,0 = 2.054 « (II) 
. 2,01 = — 3.09 (ID 
2.998 = — 2,002 = 2.878 (II) 


. Verify the following statements and translate them into words: 
230 = — 2,70 f. 2а +250 = 0 

22 = — 2.88 . & 20 +20 = 0 

2.9 = ¬ 210 h. 2.5 = — 215 

2.95 = — 2.0% i. 2.995 = — 2.005 

20+ 2.0 = 0 j. 203+ 2.97 = 0 


. Without reference to апу table, state the value of the following: 
P(z < гм) i P(z > 2.97) - 

P(z > гв) j. Plz > zas) 

P(z < г») k. P(z < 2.05 or z > 2.95) 

Р(20 < 2 < 24) L Plz < е) 

P(zo < 2) m. P(z > гв) 

P(z.0 < г) n. P(z < zon or z > 2.999) 

P(z > zs) о. Pe < 2.0 or z > 2.9) 

. Pz < 2 < гм) 


PRIMO Ao oP о pappe инг 
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4, If A represents some particular number find the value of A for which 
each of the following statements is satisfied. 


a. P(g > A) = .15 f. P(-A«z«A)-.95 

b. P(e < A) = .04 в. P(e > A) + P(e < — А) = .01 
c. P(- A <2 < A) = .30 h. P(z > А) = .995 

d. P(g > А) +Р(2<- А) = .50 i. P(e < A) = 975 

е. P(z > А) = .05 j. P(A <z < А) = .50 


Computing Binomial Probabilities by Use of Normal Proba- 
bility Tables. Since the normal distribution approximates closely 
the binomial, the table of areas under the normal curve may be 
used to compute probabilities of the binomial distribution. 

Consider the following problem. In 100 fair throws of an un- 
biased coin what is the probability that heads will be exposed 
60 times or more? 

This is a sample of 100 from a population of two classes for 
which P =Q = .5. The exact probability distribution is given by 
the terms of the binomial for which N = 100 and Р = .5. То 
calculate exactly the probability called for would require adding 
the terms 


100\ [ЛА 100\ /1\"° 1 100Y / 1® 100Y /1\"° 
(10) G) *( OF (8) (8) +(0)6) 
To avoid this formidable task, the normal curve may be used to 


obtain an approximation — and a very good approximation. This 
approximation is accomplished by means of the statistic 


a Dc 
(2.16) z VP 
N 


Notice that this statistic is zero at the mean of the distribution 
when p = Р; it has unit standard deviation because of its de- 
nominator, and its distribution is approximately normal, Con- 
sequently it can be treated like the z of the unit normal distribution. 

Returning now to the problem stated at the beginning of this 
section, we have P = .5, p = .6, and N = 100, hence 


Table I shows that P(z > 2) = .023. Hence the chance that more 
than 60 throws will show heads is .023. 
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Consider also the following example: A mental test of 80 
multiple-choice items each having 5 alternatives is presented to 
a subject who knows nothing whatever about the topic and guesses 
the answers to all of the questions. What is the chance that he 
will answer 24 or more of the questions correctly? 


Here P = =.2, р = 8% = .3, and N = 80, so 
2 8-2 


-—— = 2.24 
; MOS 
80 


The chance that the subject will answer more than 24 questions 
correctly by guessing is P(z > 2.24) = .0125 by Table III. 

The Closeness of Approximation of the Normal Curve to the 
Binomial. In order to see how well the normal curve fits a binomial 


Probability 


0 1 2 3 4 5 6 7 8 9 10 

Fic. 2-10. Binomial probability distribution with Р = .5 and 
N = 10 and normal distribution with mean = PN = 5 and standard 
deviation = VNPQ = 1.58. 


distribution we may consider a small sample in which actual 
probabilities can be computed and compared. Consider the 
probability distribution of X in a sample of 10 
cases where X runs from 0 to 10 with probabilities x P(X) 
given by the binomial distribution for which P = .5 10 001 
and N=10. X is a discrete variable with proba- 
bilities as given in the adjacent list. For the 
sake of comparison of binomial and normal dis- 
tributions, X is represented in Figure 2-10 as a 
continuous variable. Thus instead of the point 
X = 6 we have the interval 5.5 < X < 6.5, and in- 
stead of X = 0 we have the interval — .5 < X < .5. 

In such comparisons it is more effective to use 
cumulative distributions. The cumulative binomial 


© س‎ t 0 96 o Oo ч оо Ф 
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probability that X will be no greater than a given value is shown 
in the second column of Table 2.4. Thus 


P(X = 0 or 1 or 2 or 3) = Р(Х < 3:5) 
= .001 + .010 + .044 + .117 = .172 


To obtain the corresponding probabilities on the assumption that 
X is normally distributed we must first find the mean and stand- 
ard deviation of the binomial distribution. By Formula (2.10), 
и = 5 and by Formula (2.12) ¢ = V2.5 = 1.58. We must next 
find the z value corresponding to each division point (such as 
-5, 1.5, 2.5, etc.) between intervals. For example, if 


15-5 
1.58 


These z values are entered in cdlumh 3 of Table 2.4. The normal 
probability, that is, the area to the left of the ordinate at z, is 
shown in the final column of this table. 

The two cumulative probability distributions tabulated in 
Table 2.4 are shown graphically in Figure 2-11. The values of 
X listed in Table 2.4 are the midpoints of intervals in the graph. 
At these points the smooth curve and the step curve are very 


X «1.5: 01: 


Cumulative 
Probability 
10,—— 


0 1 2 3 4 5 6 Zh 8 9 10 X 


Fie. 2-11. Cumulative binomial probability distribution with P = .5 
and N = 10 and cumulative normal distribution with mean = PN = 5, and 


standard deviation = V NPQ = 1.58. 
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close together, indicating that the smooth curve provides a good 
approximation to the step curve. 


TABLE 2.4 Cumulative Probability for the Binomial Distribution with P = .5 
and N = 10 and for the Normal Distribution with ji = 5 and с = 1.58 


Xo = point Midway between two Adjacent Values of Np 


Cumulative GEM Cumulative 
Xp binomial z= T 58 normal 
probability с probability 
9.5 .999 2.85 .998 
8.5 .989 2.21 986 
7.5 945 1.58 943 
6.5 828 95 829 
5.5 623 32 625 
4.5 377 — .32 375 
3.5 172 — 95 A71 
2.5 :055 — 1.58 057 
1.5 O11 — 2.21 014 
5 001 — 2.85 002 


By way of contrast, the same treatment has been applied to 
the binomial with P = .8 and N = 10. The probability histogram 
and superimposed normal curve are shown in Figure 2-12, the 
cumulative probabilities are listed in Table 2.5, and cumulative 
probability curves drawn in Figure 2-13. Examination of Figure 
2-12 reveals that the normal curve is symmetrical while the histo- 


TABLE 2.5 Cumulative Probability for the Binomial Distribution with P = .8 
and N = 10 and for the Normal Distribution with u = 8 and с = 1.26 


Xo = point Midway between two Adjacent Values of Np 
a Se es SS ae ee 


Cumulative Talg Cumulative 
Xo binomial z= Sa normal 
probability x probability 
9.5 892 1.19 883 
8.5 .624 40 655 
7.5 .822 — 40 345 
6.5 121 — 1.19 117 
5.5 033 — 1.98 024 
4.5 " 007. = 2.77 -003 
3.5 001 — 3.56 000 
2.5 000 — 4.35 000 
1.5 000 — 5.14 000 
5 .000 — 5.93 .000 
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Probability 
Density __ 


0 1 2 3 4 5 6 7 8 9 10 1 12 


Fic. 2-12. Binomial probability distribution with Р = . and N = 10 and 
normal distribution with mean = PN = 8 and standard deviation = VNPQ 
= 1.26. 


gram is not. Examination of Figure 2-13 reveals that at the mid- 
points of intervals the smooth curve and the step curve do not 
coineide as well as they did in Figure 2-11. Even here, however, 
the discrepancy is less than many people would have expected. 


Cumulative 
Probability 
1. 


io 
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0 1 2 3 4 5 6 7 8 9 10 x 

Fic. 2-13. Cumulative binomial probability distribution with P = 8 
and N = 10 and cumulative normal distribution with mean = PN = 8, and 
standard deviation = V NPQ = 1.26. 


REVIEW EXERCISE 

1. This chapter has introduced a number of technical terms which 
are new to most readers. To phrase a completely satisfactory formal 
definition of some of these terms would require more erudition than is to 
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about population per cents is endless. Any reader can quickly 
think of many such in his own field of study. А 

The content which may be explored in problems of the kind 
discussed in this chapter is practically inexhaustible. The types 
of such problems are relatively few. Therefore, it seems eco-: 
nomical of the student's effort to organize the presentation in 
terms of types of problems with one illustration for each type, 
and to expect that the reader can apply the method to similar 
situations in the content of his own field. 

Test of Hypothesis P = .5 by Use of the Binomial Distribution. 
In order to make the discussion concrete let us return to the prob- 
lem of selecting the more intelligent of two persons by comparing 
their photographs. This problem was introduced at the end of 
Chapter 1 by considering an experiment based on five pairs of 
photographs. The analysis will be more interesting if the experi- 
ment is now extended to include more pairs of photographs, say 10. 

It was pointed out in Chapter 2 that the population studied 
by means of this experiment is a dichotomous or two-class popula- 
tion. The parameter P of the population is the probability of 
correctly selecting the more intelligent member of any random pair. 

A variety of hypotheses might be formulated about this un- 
known P. Let us consider first the hypothesis that the experi- 
menter has no ability to distinguish levels of intelligence by 
comparing photographs, so that a correct comparison is as likely 
as an incorrect comparison. In terms of the parameter the hy- 
pothesis is P = .5, 

The consequences of this hypothesis are stated in terms of 
the sampling distribution of the statistic p which is, for an ob- 
served sample, the proportion of comparisons in which the correct 
selection is made. This distribution of P appears in Table 3.1. 
It is a special case of the binomial distribution shown in Table 
2.1 when N = 10 and P = .5. The reader will note that the sam- 
pling distribution of p was constructed without knowledge of the 
proportion observed in any actual sample. The distribution sup- 
plies a statement of the probability, if the hypothesis is true, of 
obtaining in a sample of the given size one of the eleven possible 
values of p. 

The hypothesis P = .5 can now be tested by actually perform- 
ing the experiment with ten pairs of photographs and comparing 
the observed proportion of correct selections with the sampling 
distribution of p in Table 3.1. 
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Under the hypothesis that correct selections will be made in 
approximately half the comparisons (P = .5) it is reasonable to 
anticipate that in a sample of 10 cases an observed proportion 
p will be not far from .5. If it is very far from .5, say 0 or 1.0, 
most people intuitively feel that the observation contradicts the 
hypothesis, throws doubts upon it, renders it implausible. But 
intuitively, different people might hold different opinions as to 
what “very far from .5” means. The function of the sampling 
distribution is to provide an aid to intuition and to regularize 
intuitive decisions. “Very far from P” will take on different 
meanings as sample size changes and as P changes and so cannot 
provide a dependable rule for dealing with hypotheses. However, 
it can be agreed that an observed sample and a hypothesis are to 
be considered in conflict whenever, under the hypothesis, the 
probability of obtaining a sample as extreme as the one observed 
is not greater than some specified small number, say 1/20 = .05 
or 1/100 = .01. In subsequent chapters various sampling dis- 
tributions will be considered. Regardless of the nature of the 
sampling distribution, or the statistic to which it relates, one may 
always agree to consider that the extreme observations having 
this specified small probability warrant rejecting the hypothesis 
tested. 

TABLE 3.1 The Sampling Distribution of an Observed Proportion p in 

Samples of 10 Cases from a Population in which P — 5 


Probability for Probability for 

p any value of P Р. = .5 
1.0 pe 1/1024 — .001 
9 10P*Q 10/1024 — .010 
8 45P°Q* 45/1024 = .044 
sf: 120P'Q* 120/1024 — .117 
6 210P°Q* ` 210/1024 = .205 
5 252P5Q° 252/1024 = .246 
4 210P'Q* 210/1024 = .205 
3 120P°Q’ 120/1024 = .117 
2 45P7Q* 45/1024 = .044 
gi 10PQ* 10/1024 — .010 
0 Qe 1/1024 = .001 
1.000 


RIT:‏ کے ازن س ہے بای 
According to Table 3.1 we see that under the hypothesis‏ 
Р = .5 the probability that р = 0 or 1.0 is‏ 
P(p = 0 or 1.0 | N = 10 and P = 5) = 1073 + тт = -002‏ 
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Also the probability that p will be as extreme as .1 or .9 is the 
probability that p will be either 0, .1, .9 or 1.0 and this is ү 


P(p = 0, .1, .9, or 1.0 | N = 10 and P = .5) 
= тён + т} + тн + rdza = тїї = .022 


This probability is ‘so small that practically all research workers 
would agree that if one of these values of p is observed in a sample 
the hypothesis Р = .5 should be rejected. Values of p which, 
when observed in a sample, lead to rejection of a hypothesis will 
be called rejection values. If the hypothesis P =..5 is rejected 
whenever one of these values of p appears in a sample, then in 
the long run a true hypothesis will be rejected in 22 out of 1000 
samples. Most people consider that 7225 = .022 is a small risk 
to take. These four values of p are said to constitute a critical 
region, or a region of rejection, or region of significance. The re- 
maining possible values of p, that is, p — .2, .3, .4, .5, .6, 7, and 
8 are then said to constitute a region of acceptance. The proba- 
bility of .022 is called the level of significance or the size of the region 
of rejection. The probability 1.00 —.022 = .978 is then the size 
of the region of acceptance. 

Some readers may be inclined to include the points .2 and .8 
in the region of rejection. The probability of a sample as extreme 
as these under the hypothesis P = .5 is 


P(p = 0, .1, .2, .8, .9, or 1.0) = Jy = .11 


Most research workers are reluctant to adopt a rule which would 
cause them to reject a true hypothesis in 11 out of 100 samples 
and so they would prefer to classify .2 and .8 as acceptance values. 

If a sampling distribution is continuous, as for example the 
normal probability curve, the level of significance can be chosen 
arbitrarily, and for this purpose some small conveniently rounded 
number such as 75 or тфу is commonly chosen. If a sampling 
distribution is discrete such an arbitrarily chosen level of signif- 
icance may not conform to any probability which can possibly 
be obtained by cumulating probabilities at the extremes of the 
distribution. Thus we have just seen that in the present prob- 
lem, the level of significance can be set at .022 or at .11 and not 
at any intermediate value. If the research worker wants to work’ 
at the .05 level, about the best he can do in such cases is to say he 
will make the size of the region of rejection not larger than .05 
and the size of the region of acceptance not smaller than .95. 
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In Figure 3-1, the region of rejection consists of the points 
marked with heavy dots. The region of acceptance consists of 
the points marked with circles. The size of each region is repre- 
sented by the sum of the ordinates drawn at points in that region. 
Suppose that among comparisons of 10 pairs of photographs, cor- 
rect selections have been made in 9 pairs. The observed value 
р = .9 falls in the critical region, the region of rejection, There- 
fore the hypothesis P = .5 must be relinquished. 


Probability 
28 --—---—------—--------—- 


Q4p-————-—--——4-—-—---—----— 


0.1 259/5475 BS LE 9,10 
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Sample Value of p 
Fic. 3-1. Rejection values, e e and acceptance 
values о о for hypothesis Р = .5 and associa 
probabilities for samples of 10 cases. 

The critical region can be located directly in the cumulative 
probability graph of Figure 3-2. Suppose you have decided to 
use a critical region of size a and to locate half of it in each tail. 
For illustration, let us assume you have decided on a = .05. On 
the vertical axis mark the point corresponding to 4(.05) = .025, 
and draw a horizontal line through it. This horizontal line meets 
the step graph of Figure 3-2 at p = „2. All values of p to the left 
of this intersection, not including p = .2, form the left critical 
region with probability not greater than .025. On the vertical 
axis mark the point corresponding to 1 — 4(.05) = .975 and draw 
a horizontal line through it. "This horizontal line meets the step 
graph in the point p = .8. All values of p to the right of this 
intersection, not including p = .8, form the right critical region 
with probability not greater than .025. As in Figure 3-1, heavy 
dots have been used to represent rejection values. 

Tests of Several Hypotheses on P. Inasmuch as the hy- 
pothesis P = .5 has proved unacceptable, alternative hypotheses’ 
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о b—d—4—§_-4 1) 
0122345 6.7 8 9 10 
Scale of p 
Fic. 3-2. Rejection values e • and acceptance values 


9 о for hypothesis P = .5 and associated cumulative 
probabilities for samples of 10 cases, 


may be tried. Any value of P, other than .5, between 0 and 1 
may be an alternative. As the sample is finite, p can take values 
at the points 0/N, 1/N, 2/N, - .- (N — 1)/N, N/N and at no other 
points. Therefore the scale for р consists of discrete points. 
While P is a constant for any one population, we are lere con- 
sidering all possible dichotomous populations, and so the scale 
for P is a continuum extending from 0 to 1. For certain selected 
hypothetical values of P the probability distribution of p has 
been computed and is shown in Table 3.2, Each horizontal row 
of the table is a sampling distribution for which the sum would 
be 1.00 except for rounding errors. The vertical columns are not 
probability distributions. ‘ 

We shall now seek rejection values for each of these distri- 
butions: and shall represent them graphically in Figure 3-3 in 
such a way as to be able to study them all at one time in one 
unified picture. 

Suppose it has been agreed that the region of rejection shall 
have probability .025 or less at each end of the distribution. Now 
let us see how this would work out for a particular row of Table 
3.2, say the row for which P = .70. The sum of the probabilities 
for p=0, p = .1, p -.2 and р = .З is approximately .010 which 
is considerably less than the specified .025. However if the proba- 
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TABLE 3.2 Sampling Distribution of an Observed Proportion p in Samples 
of 10 Cases for Selected Values of P 
(Decimal points omitted to save space) 


Probability of a given value of p 
P 0 1 2 3 A 5 6 at 8 9 10 Sum 


95 001 010 075 315 599 1000 
90 002 011 057 194 387 349 1000 
85 001 008 040 130 276 347 197 999 
80 001 006 026 088 201 302 268 107 999 
75 003 016 058 146 250 282 188 056 999 
70 001 009 037 103 200 267 233 121 028 999 
.65 001 004 021 069 154 238 252 176 072 013 1000 
.60 002 011 042 111 201 251 215 121 040 006 999 
.55 004 023 075 160 234 238 166 076 021 003 1000 
.50 001 010 044 117 205 246 205 117 044 010 001 1000 
45 003 021 076 166 238 234 160 075 023 004 1000 
40 006 040 121 215 251 201 111 042 O11 002 999 
35 013 072 176 252 238 154 069 021 004 001 1000 
30 028 121 233 267 200 103 037 009 001 999 
.25 056 188 282 250 146 058 016 003 999 
20 107 268 302 201 088 026 006 001 999 
15 197 347 276 130 040 008 001 999 
10 349 387 194 057 011 002 1000 
05 599 315 075 010 001 1000 


А 


bility for р = .4 is included the sum becomes .047 which is con- 
siderably larger than the specified .025. On Figure 3-3 heavy 
dots mark rejection values. These are so chosen that the proba- 
bility for the region of rejection in each tail is .025 or less. 

In Figure 3-3 regions of rejection have been drawn in similar 
fashion for each of the selected values of P. The probabilities 
associated with these regions are not uniform, as can be seen by 
computing a few of them from the figures in Table 3.2. As a 
kind of compromise among these various regions, the pair of 
curved lines has been drawn on Figure 3-3. Whenever there is 
a region of probability approximately .025 the lines pass near the 
end point of that region. 

These curved lines can be used to obtain a region of rejection 
for some value of P not listed in Table 3.2, say Р = .47. The 
point .47 is located on the vertical axis and a horizontal line 
drawn across the chart, cutting the curves at points for which pi 
is approximately .13 and р» approximately .85. All values 
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Population Value of P 


Sample Value ofp 


Fig. 3-3. Regions of rejection e e and acceptance for testing hypotheses 
concerning P on evidence from a sample of 10 cases at .05 level of significance. 


for which .13 < p < .85 are acceptance values and all points for 
which p > .85 or p < .13 are rejection values for the hypothesis 
P = .47 at significance level .05. Actually for the discrete dis- 
tribution this means that«the values 0, .1, .9, and 1.0 are rejection 
values with probability somewhat smaller than .05. In this man- 
ner the range of rejection values can be obtained for any value of P. 

A sample with proportion p drawn from a population with 
proportion P would be represented on Figure 3-3 by a point with 
coordinates (p, P). Such a point is called a sample point. Locate 
the following sample points and note that all of them are between 
the pair of curved lines and near the upper line: . 


(p= 4, P = Л), (p = .6, P- .8, (p = .8, P = .9). 


Locate the following sample points, all of which are below the 
lower curved line: 


(p = .6, P = .2), (p =.8, P = .3), (р = 4, P = .05). 
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Locate the following sample points all of which are above the 
upper curved line: 


(p = .3, P = .9), (р = .5, P = .85), (р = .6, Р = .95). 


The area between the curved lines in Figure 3-3 is a region 
of acceptance and the area outside them a region of rejection, 
or critical region. Whenever the two coordinates of a sample 
point (p, P) are such that the point falls in the region of rejection, 
the observed value p is considered to furnish evidence justifying 
the rejection of the hypothesis concerning P. Thus the figure 
furnishes us a short cut to the testing of hypotheses about pro- 
portions. In the problem concerning the judgment of intelligence 
from photographs, suppose the experimenter is correct in 9 out 
of 10 pairs, so p=.9. The sample point for which р = .9 and 
Р = .5 lies in the region of rejection. So does every sample point 
for which p = .9 and P< .5. Sample points for which р = .9 and 
.55 < P < .98 (approximately) are in the region of acceptance. 


EXERCISE 3.1 
1. Test each hypothesis (a) to (f) by locating a point on Figure 3-3. 

- di (à P245 (с) P=.2 (е) P=.57 

HN =10and p= 4 ()ур=-9 ()P=65 (f) P=.73 

(а) P =.53 (в) P=.16 (e) P 2.61 
(P-20 (d) Р=.73 (f) P=.94 
L (^. (P235 ()P-0 ()P-5 
IN = 10and P=. (jp. (d Р=.01 (f) P=.49 


If N = 10 and p = .5, 


2. From the answers to the three parts of question 1, formulate a 
general statement as to the range of hypotheses acceptable at the .05 level 
when N = 10 and (a) p= .4; (b) p=.5; (c) p=.2. 


Estimation of P. To this point we have been concerned with 
tests of hypotheses about the value of the population proportion, 
P. A problem which is perhaps of greater interest is to find from 
the information in the sample an estimate of the value of P. We 
shall discuss two methods of estimating P, (1) by a single value 
and (2) by an interval. 

Estimation by a Single Value. On page 28, it was found that 
E(p)- P. This means that if an unlimited number of samples 
were drawn and p calculated for each of them, even if the number 
of cases in each sample were very small, the average of all the 
computed values of p would be P. Thus, even though one must 
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expect to be wrong in nearly every instance, he may expect to be 
right on the average, in the long run. 

If a statistic has a population parameter as its expected value, 
the statistic is called an unbiased estimate of that parameter. In 
later chapters we shall meet statistics which do not possess this 
highly desirable quality. 

The sample proportion is also called a consistent estimate of 
the population proportion because for large samples the value 
of p is likely to be close to P. In Chapter 2, the standard de- 


DO m 
viation of р was given by Formula (2.14) as v, = Vie. To see 


what this means in relation to the value of N, consider samples 
of 25, 100, and 1000 cases from a population in which P is actu- 
ally .5. 

If N = 25, op = .1 

If N = 100, с, = .05 

If N = 1000, c, = .0158 


consequently 


P(.30« p< .70| = 25) = .95 
P(.40 < p< .60 | N = 100) = .95 
Р(.47 < p< .53 | N = 1000) = .95 


To put this more concretely, suppose a poll is conducted to 
determine what proportion of residents of a community are in 
favor of a particular issue. If half of them are actually in favor 
of the issue (P = .5), there is probability .05 that in a sample of 
25 cases the observed proportion p will be in error by .20 or more, 
that in a sample of 100 cases it will be in error by .10 or more, 
and that in a sample of 1000 cases it will be in error by .03 or more. 

Interval Estimate. A statistic with the desirable qualities of 
unbiasedness and consistency provides an estimate of the param- 
eter which may be assumed to be numerically close to that 
parameter. An estimate of a parameter which is often more 
satisfactory is one that uses two statistics, unequal in value, which 
jointly provide an interval estimate of the parameter. 

The parameter is stated as lying between these statistics in 
much the same way that one might say “T feel confident John’s 
age is between 25 and 30.” There are two important differences 
between this somewhat casual estimate of John’s age as lying in 
the interval 25-30 and the statistical intervals about to be dis- 
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cussed: (1) In the statistical estimate the degree of confidence 
will be decided upon before the, interval is obtained and will be 
described in numerical terms, whereas in the popular expression. 
the word “confident” means different things to different people. 
(2) In the statistical estimate, the numbers which define the in- 
terval are computed from the data by a rule which guarantees 
the degree of confidence to be placed in the interval estimate, 
whereas in the popular expression the numbers are more or less 
impressionistic. 

Tf one makes the interval wider, he can have greater confidence 
that his statement is correct. Thus the estimate that “John’s 
age is between 5 and 95” might be made with practically complete 
confidence but the interval would be so wide as to be of little 
practical value; the estimate that “John’s age is between 26 years, 
6 months, and 26 years, 7 months,” could be made far with less 
confidence if John’s age is actually unknown. 

An interval estimate is a statement about a parameter such 
as “P is between .3 and .5.” The statement is either true or false 
and we do not know which, but the methods by which the state- 
ment is obtained furnish a measure of the confidence with which 
the assertion may be made. That measure of confidence, called 
a confidence coefficient, is usually a value between .90 and 1.00. 
After this confidence coefficient has been chosen, two statistics 
are computed from the sampie, both from the same sample. For 
convenience we shall call these two numbers A and B, with A 
less than B. Then the interval estimate of Р is the statement 


A Pc. 


The numbers A and B are called the confidence limits and the 
interval from A to B is called a confidence interval. 

Interval Estimate for P when N = 10. The abstract discus- 
sion of the preceding paragraph will now be illustrated by a com- 
putation making use of Figure 3-3, which applies to samples of 
10 cases. The first step in obtaining the numbers A and B is 
to select a confidence coefficient. If we are to make use of Figure 
3-3 which has been constructed with a = .05, we must in this 
illustration use a confidence coefficient of 1 – .05 = .95. The 
relation between the confidence coefficient and a will be explained 
below. We shall now assume that in a sample of 10 cases, 3 in- 
dividuals have been found to possess a particular characteristic, 
вор -.3. The proportion P of the population is unknown. 
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On the baseline of Figure 3-3, locate the observed sample 
value, p = .3 and place a ruler or the edge of a card so as to in- 
dicate the vertical line through this point. Note where this line 
cuts the two curves. On the vertical scale read the values of 
these intersections, calling the smaller value A and the larger В. 
For p = .3 we find A = .05 and В = .65. The interval estimate 


.05 « P< .65 


can therefore be made with confidence coefficient .95. To dis- 
tinguish a confidence statement from a probability statement, this 
text will use the notation 


Conf(.05 < P < .65) = .95 
C(.05 < P < .65) = .95 


This may be expressed in words thus: “We have confidence .95 
that the unknown proportion lies between .05 and .65" or “We 
assert that P is not smaller than .05 and not larger than .65 and 
we have arrived at these numbers by a procedure which if applied 
repeatedly would yield interval estimates of which 5% would not 
contain the true value and of which the remaining 95% would 
contain it.” 

To understand why an interval estimate computed by the 
method just described is correct in 95% of samples, note that the 
curves in Figure 3-3 were drawn so that for any given value of 
Р. the probability is .95 that the sample point (P, p) is between 
the curves. Since this holds for any value of P it holds for the 
unknown P of the population being sampled. This means that 
for 95% of samples from the same population the values of p will 
determine a sample point which is on the horizontal line through 
P and which lies between the two curves. For all these values 
of p the confidence limits will contain P between them. Hence the 
statement that P is contained between the calculated limits can 
be made with .95 confidence. 

Figure 3-3 was constructed with level of significance о = .05. 
It is apparent now that the confidence coefficient is 1 — о, and 
the confidence interval for P may be stated in general as 


(3.1) C(A«P«B)-1-a 
Location of Confidence Limits for Samples of Varying Size 
by the Use of Charts. The intervals which can be read from 


Figure 3-3 are only approximately correct because of the dis- 
continuity of the actual sampling distribution for N = 10, When 


or 
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N becomes larger, the binomial probability distribution appears 
smoother because the steps are smaller and more numerous so 
that there is less and less discrepancy between it and a fitted 
smooth distribution. For small samples such as N — 10 it is not 
possible to mark off exactly 2.5% of the area at each end of each 
distribution and so the volume outside the curved lines cannot 
be assumed to be precisely 5% of the total volume. These diffi- 
culties tend to disappear when. samples of larger size are used. 

If the rejection points of Figure 3-3 are not marked but only 
the curves drawn, it is possible to draw on one chart the con- 
fidence belts for samples of several different sizes. Such charts 
for eonfidence coefficients of .99 and .95 have been published by 
C. J. Clopper and E. S. Pearson. * Additional charts for confidence 
coefficients of .90 and .80 are published on pages 332 and 333 of 
Techniques of Statistical Analysis? The Clopper-Pearson chart 
for confidence coefficient .95 is reproduced here as Chart VI in 
the Appendix. The small numbers written on the curved lines 
indieate the size of sample for which the confidence belt applies. 
N =10, 15, 20, 30, 50, 100, 250, and 1000. Study of this chart 
reveals that for a given confidence coefficient the belt becomes 
narrower as sample size increases. As the size of the sample 
approaches that of the population, the belt narrows to a mere line. 
For any given sample size, the confidence belt would be wider 
if the confidence coefficient were larger. For any finite value of 
N, the confidence belt would narrow to a line as the confidence 
coefficient approaches zero. 

Confidence Intervals for Other Parameters. The preceding 
discussion of confidence intervals applies also to parameters other 
than P. A confidence statement about a population mean made 
with confidence coefficient .90, for instance, might read 

C(46.3 < u < 52.7) = .90 
and one about a population variance made with confidence .99 
might read 

C(18.3 < o? « 21.5) = .99 
Rules for obtaining the interval estimates for the mean, the 
variance, and the correlation coefficient will be developed in later 
chapters. 

' Probability and Confidence. Some discussion of the two terms 

probability and confidence is needed to clarify the difference be- 


* References referred to by number will be found at the end of the Chapter. 
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tween them. An analogy between sampling and games of cards 
will help to clarify the distinction. 

We may conceive of the deck of cards as a population, and of 
a hand dealt after shuffling the deck as a random sample. There 
are two ways of reasoning about the deck of cards and the dealt 
hand: 

(1) From a known or hypothesized characteristic of the deck 
we can calculate the probability that the hand will have some 
characteristic. We can, for example, calculate the probability 
that in a game of bridge a hand will contain 10 or more spades. 

This form of reasoning is analogous to computing the proba- 
bility of a characteristic of a random sample from a known, or 
hypothesized, characteristic of the population. 

(2) From the known composition of a hand we wish to draw 
some conclusion about the unknown composition of the deck. A 
player who sees his opponent draw a hand consisting of 13 spades 
may lose confidence in the notion that the deck is the standard 
bridge deck. He may go further and say he has confidence that 
the deck has any one of certain types of composition and that it 
does not have certain other types. 

The concept of probability is used in ressoning from a known 
population to a random sample. The concept of confidence is used 
in reasoning from an observed sample to its unknown population. 


EXERCISE 3.2 

1. From Chart VI read the confidence interval for the population 
proportion if in a sample of 30 cases 9 cases have shown a particular 
characteristic, p = зу = .3. Place a card or edge of a ruler on the chart 
to mark the vertical line through the point .3 on the horizontal axis. 
Note the points where this line cuts the two curves marked 30. The scale 
values of the ordinates of these points appear to be approximately .15 
and .50. Thus the interval sought is .15 < Р < ‚50. 


2. From Chart VI verify the following interval estimates: 


a. If N = 20 and p=.4, 19 <Р < .64 
b. If N = 30 and p = A4, .22 « P < .60 
c. If N = 100 and p = 4, 30 < P < .50 
d. If = 1000 and p = .4, 37 « P « 43 


3. What interval would you read from Chart VI if = 50 and p= 44 
The scale point .44 is not marked on the horizontal axis but can be esti- 
mated by eye with fair accuracy. Then .29 < P < .58. 

4. What interval would you read if N = 40 and р = .75? For N = 30, 
the interval is .55 < P < .89. For N = 50 it is .60 < P < .87. A rough 
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estimate for N = 40 would place the interval limits between these two 
but á little closer to the limits for N = 50. A fairly good guess would 
be .58 < P < .88. 

5. Read estimates for P from Chart VI when 


а. N = 20 and p = .65 d. N = 30 and p = .25 
b. N = 100 and p = .65 е. N = 50 and p = .25 
c. N = 250 and p = .65 f. М = 1000 and p = .25 


6. Comparing the answers obtained in question 5, what appears to be 
the effect on a confidence interval of changing the size of the sample while 
keeping the confidence coefficient constant? 

T. In a sample of 100 ball bearings chosen at random from those coming 
from a factory during one morning, 10 have too large a diameter and are 
considered defective. What interval estimate can be read from Chart VI 
for the proportion with too large diameter in the entire morning’s output 
of the factory? 

8. In a random sample of 96 parents of grade school children in City A 
it is found that 75 are in favor of using tax money to establish a nursery 
school in connection with the publie schools. Formulate an interval 
estimate with о = .05 for the proportion of all parents holding the same 
opinion. 


Aids Available for Computing Binomial Probabilities. Com- 
putation of the terms of the binomial distribution is so laborious 
for large values of N that anyone who needs to make tests about 
proportions will wish to make use of published aids and also of 
methods of approximation. In the Appendix of this book are 
tables which can be used to reduce arithmetic labor. 

Table IVB gives the cumulative probability, or the sum, of 
the first m + 1 terms in either tail of the symmetrical binomial 
distribution for which P = .5, for values of N running from 5 to 
25. Theentries in the right-hand column of Table 3.1 on page 45 
were taken from the entries in the row N = 10 of Table IVB. 


Thus for m = 1, .011 = .001 + .010 
т= 2, .055 = .001 + .010 + .044 
m = 3, ‚172 = .001 + .010 + .044 + .117 


For Р = .75, the cumulative probability for the first m + 1 
terms in the left tail is given in Table IVA, and for the last m + 1 
terms in the right tail in Table ТУС. ? 

For Р = .25, the cumulative probability for the first m + 1 
terms in the left tail is given in Table IVC, and for the last m 4- 1 
terms in the right tail in Table IVA. 
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Suppose now that we wish to test the hypothesis P = .75 by 
means of a sample of 24 cases, using approximately the .05 level 
of significance. Suppose further that we wish to discard the hy- 
pothesis if p is either too much larger or too much smaller than 
.75. What is the region of significance? If we had the entire 
distribution of (.25 + .75)*, we could find the left critical region 
by adding terms at the first end of the distribution, 


(.25)* + (t) (25)9(.75) + (2) (.25)2(.75)? +... 


until the sum of those terms was approximately .025. The same 
result can be achieved painlessly by looking in row N = 24 of 
Table IVA for the entry nearest .025. This is .021 in column 
m = 18, and it indicates that 


P(X = 0, 1, 2--+ 12 or 13 | N = 24 and P = .75) = .021 


Then the left critical region consists of the values X = 0, 1۰۰.13 
or of p = ў, д, 4}. To find the right critical region, we could 
add terms at the right end of the distribution 


(15)* + (2) (25)(.75) + (25) (25)15) +... 


until their sum was approximately .025. This result can be 
achieved painlessly by looking in row N = 24 of Table IVC for 
the entry nearest .025. This is .040 in the column m = 2. It 
may be read to mean either that 


. P(X =0, 1 or 2| N = 24 and P = .25) = .040 
ог that P(X = 24, 23, or 22 | N = 24 and P = -75) = .040 


The latter interpretation provides us with the right critical region 
needed. It is X = 24, 23 and 22 or p = 34, 2%, and $$. These two 
regions combined have probability .021 + .040 = .061. Because of 
the discrete nature of the distribution this is as near to the .05 
level as we can come. 

Under the same circumstances if we wish to use a one-tailed 
test, rejecting the hypothesis only for values of X much smaller 
than those expected under the hypothesis, we should look in 
Table IVA for the entry nearest .05. This is .055 in the column 
т = 14. It indicates that the critical region consists of X = 0, 
1, 2,... 14. If we wish to use a one-tailed test rejecting the 
hypothesis only for values of X much larger than those expected 
under the hypothesis, we should look in Table IVC for the entry 
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nearest .05. It is .040 and indicates that the critical region with 
probability .04 consists of X = 24, 23 and 22. 
If some value of P other than .25, .50 or .75 is needed, Table V 


may be used. It provides the binomial coefficients (2) from 


which the terms of the binomial distribution can be obtained by 
multiplication. The (т + 1)st term of the distribution for given 
N and P is obtained by multiplying the number in row № and 
column m of this table by QY-7P". Up to N = 10 all the coeffi- 
cients are shown. For N > 10 the page is too narrow to ac- 
commodate the entire set, but as the set is always symmetrical 
the remaining coefficients can easily be obtained from those 
presented. The coefficient for term N — m is the same as for 
term m. Table V is useful only for N not larger than 20. 

Very extensive tables of the binomial probability distribution 
have been published by the National Bureau of Standards.* 
These give values of the individual terms of (Q + P)" for values 
of P from .01 to .50 by intervals of .01 and for values of N from 
2 to 49 by intervals of 1. They also give values of the sum of 
the terms for the same values of P and N. 


EXERCISE 3.3 1 

1. Suppose that from the rolls of a large university the cards of 22 
students are selected at random and these students are asked to agree or 
disagree with a statement of opinion on some public issue of the day, but 
are not permitted to say they are undecided or uninformed. Let p be the 
proportion answering “Yes” and q the proportion answering “No.” If 
the hypothesis that P = .5 proves tenable, it will be concluded that the 
students in this university have no consistent opinion on this issue. What 
number of “Yes” answers would constitute a critical region for the rejec- 
tion of this hypothesis? Answer for the .02 level of significance, and for the 
.05 level. 

Solution. The population which is the entire student body of the 
university, is not infinite but is large. The hypothesis will be refuted if 
either too many or too few students answer “Yes.” At the .02 level of 
significance, the critical region will consist of the most extreme terms at 
the lower end of the binomial with Р = .5 and N = 22 such that the sum 
of the probabilities of those terms is .01 and the most extreme terms at the 
upper end of the distribution such that the sum of the probabilities of those 
terms is .01. On line N = 22 of Table IVB we see that .008 is the sum of 
6 terms at one end of the distribution. Therefore if X is the number of 
“Yes” answers, at the .016 level of significance (which is the nearest pos- 
sible value to .02 when the critical region is in both tails), the critical 
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region consists of values of X 5 5 and X 2.17. At the .05 level of sig- 
nificance the critical region is X < 6 and X > 15. 
2. Answer the question of problem 1 under the following circumstances: 
a. N = 14 and level of significance is .06 
b. М = 18 and level of significance is .03 
c. N = 25 and level of significance is .05 


One-sided and Two-sided Tests of Hypotheses. In Chapter 1, 
in testing the hypothesis that a subject’s intelligence could not 
be inferred from his photograph, it was tacitly assumed that rank- 
ing photographs in completely wrong order (DCBA when ABCD 
is correct) or misjudging every pair of photographs (thus making 
the record — – – – – ) would not throw suspicion on the hy- 
pothesis. Therefore the critical region was located entirely in one 
tail of the probability distribution. Such tests are called one- 
tailed tests, or one-sided tests, or tests of a one-sided hypothesis. In 
Figure 3-1 on page 47 and the test related to it, the critical region 
was located in both tails of the distribution, half of the region 
being in each tail. Such tests are called two-tailed tests or two- 
sided tests, or tests of a two-sided hypothesis. Up to this point the 
choice between a one-sided and a two-sided test has been made 
more or less intuitively from the general logie of the question to 
be answered by the test. We shall now develop certain concepts 
which will assist in clarifying the reasons for the location of the 
critical region. 

The letter H is customarily used as an abbreviation for hy- 
pothesis. Thus H : Р = .5 means “the hypothesis that in the 
population the proportion of cases having the given character- 
istic is .5" while H : P > .5 means "the hypothesis that in the 
population the proportion of cases having a given characteristic 
is .5 or greater." 

Two Kinds of Error. Rejecting a hypothesis when it is true 
is called an error of the first kind. The level of significance is the 
probability of making an error of this kind. As the level of signif- 
ісапсе is commonly called œ (the small Greek letter alpha cor- 
responding to the English a), an error of the first kind is also 
called an alpha error. The probability of an error of this kind 
can be made arbitrarily small by making alpha small, but unfortu- 
nately a reduction in the probability of rejecting a hypothesis 
when it is true is accompanied by an increase in the probability 
of accepting it when some alternative to the hypothesis is true. 
This latter error is called an error of the second kind or a beta error. 
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The Greek letter 8 (beta) is used to represent the probability of ac- 
eepting a hypothesis when some alternative is true. Most statis- 
ticians do not look with favor upon choosing an extremely small 
level of significance because that would expose them to a large risk 
of error of the second kind. This relation will be clarified in the 
following paragraphs. 

Suppose that P is actually .6 but we do not know that fact 
and we test the hypothesis Р = .5 using a sample of 10 cases. 
Suppose also that, knowing the probability distribution of p under 
the hypothesis tested, we decide upon p = 0, .1, .9 or 1.0 as the 
region of rejection. Thus 


а = .001 + .010 + .010 + .001 = .022 or .02. 


What is the probability that the hypothesis P = .5 will be rejected 
when Р is really .6? Accepted when P = .6? 

Table 3.2 on page 49 provides a very easy means of answering 
this question. As we are assuming that P is actually .6, the true 
probability distribution of p is given in the row for which P = .6. 
Then 

P(p = 0,.1, .9 or 1.0 | P = .6) = 0 + .002 + .040 + .006 = .048 
and P(.2 <p < .8 | P = .6) = 1 — .048 = .952 


The risk of a false acceptance of H : P = .5 when Р = .6is therefore 
В = .954. With a larger discrepancy between the true value of 
P and the value under the hypothesis, the risk of acceptance of a 
false hypothesis would be less. For example 


P(p = 0,.1, 9 or 1.0 | P = 1) = .349 + 887 +0 +0 = .74- 
and P(2sps$.8|P-.)-21-.74-.26 


Hence the risk of false acceptance is only .26 when P = 1. The 
probabilities here treated as zero are not absolutely zero but are 
merely numbers so small that only zeros oceur in the first three 
decimal places. Rounding errors cause some of the totals at the 
right of Table 3.2 to be slightly different from 1.000. 

Similar computations have been made for each of the values 
of P shown in Table 3.2 and these have been recorded in columns 2 
and 3 of Table 3.3. For all these the level of significance was 
taken as æ = .02 and the region of significance as p = 0, .1, .9 or 1. 

If the region of significance is increased to include p = .2 and 
p = .8, the significance level becomes о = ‚11. The probability 
of rejecting Н:Р = .5 for this situation has been computed for 
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each of the same values of P and recorded in column 4 of Table 
3.3; the probability of accepting H : P = .5 is recorded in column 5. 


TABLE 3.3 The Probability of Accepting the Hypothesis Р = .5 on Evidence 
from a Sample of 10 cases when P has a Specified Value, if a = .02 and if 
а = Л 


а = 02 а = 11 

True Probability Probability Probability Probability 

Value of P of rejecting of accepting of rejecting of accepting 

H:P= 5 H:P=.5 Н:Р = 5 Н:Р = 5 
.95 91 99 99 01 
90 74 26 93 07 
85 .54 46 82 8 
80 38 62 68 32 
45 24 76 .53 AT 
70 15 85 38 62 
.65 .09 :91 27 73 
.60 :05 .95 18 82 
55 03 97 A3 87 
50 02 98 1 89 
45 03 97 13 En 
40 05 95 18 82 
35 09 91 27 73 
.30 15 85 38 62 
.25 24 76 53 AT 
20 38 .62 .68 .32 
A5 54 46 82 .18 
10 Л4 20 .93 07 
05 91 09 99 01 


Rejection values are p = 0, .1, .9 and 1.0 when a = .02 
Rejection values are p = 0, .1, .2, .8, .9 and 1.0 when а = .11 


The test of a statistical hypothesis involves a statistic and 
а critical region. The probability that the statistic will fall in 
the critical region is the probability that the hypothesis will be 
rejected. This probability of rejecting a hypothesis is called the 
power of the test. The power is a variable which depends upon 
what alternative to the hypothesis is actually true. If we knew 
that a certain alternative were true we should not need to test 
the hypothesis, So we cannot establish the power of the test for 
the one correct alternative but must consider the power in general 
for various alternatives. If two tests are equally satisfactory as 
to the level of significance and so involve the same risk of rejecting 
a hypothesis when it is true, but one test is more powerful than 
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the other because it is more likely to cause rejection of a false 
hypothesis, then the more powerful test is to be preferred. An 
illustration of this situation will be given presently. 

Certain important relations can be brought out more clearly 
by means of а graph. In Figure 3-4 values of P which are alterna- 


Probability of rejecting 
Н:Р=.50 


354 H .6 .7 .8 .9 10 
Scale of P 


Fic. 3-4. Power function of the test of 
hypothesis P = .5 for N = 10 at two levels 
of significance, а = .02 and а = Wl. 


tives to the hypothesis are laid off on the horizontal scale. The 
value under the hypothesis is marked H. The vertical axis pass- 
ing through H is sealed to show probabilities from 0 to 1. At, 
each of the indicated values of P an ordinate is erected with 
length proportional to the power of the test for that alternative. 
For the curve marked a = .02 these ordinates represent the proba- 
bilities in column 2 of Table 3.3; for the curve marked a = .11 
the ordinates represent the probabilities in column 4. The curved 
line which connects the tops of one set of these ordinates is the 
graph of the power function of the test concerned. This graph 
crosses the vertical axis at a point whose ordinate is the level 
of significance. 


EXERCISE 3.4 

1. Verify the probabilities stated in Table 3.3 by computation from 
the data of Table 3.2. ; 

2. Make similar computations with o = .002 and critical region p = 0 
or 1. Plot the corresponding power function on the chart of Figure 3-4. 
Note that it lies entirely below the other graphs. Does this mean the test 
with a = .002 is more powerful or less powerful than the test with а = .02? 

3. Answer each of the following questions by naming а line segment 
in Figure 3-4. Place the letter J where the lower curve and K where the 
upper eurve cut the vertical axis. Place М at top of that axis. 

a. а = .02 Ъ. а = „11 
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c. the probability of rejecting H at significance level a = .02 if P is .2 

d. the probability of rejecting H at significance level a = .11 if P is.2 

e. the probability of accepting Н when it is true if a = .02 

f. the probability of accepting H when it is true if a = .11 

g. the probability of accepting H when the actual value is P = .2 if 

а = .02 
h. the probability of accepting Н when the actual value is P = 2 if 
о = „11 

i В when Р = .20 and a = .02 

j. B when P = 20 and a = .11 

4. From your answers to question 3 which significance level provides 
the more desirable test when the hypothesis tested is true? Which provides 
the more desirable test when some other hypothesis is true? 

5. In testing H : P = .5 with critical region p = 0, .1, .9, and J, against 
which alternative is the test more powerful, P = .60 or P = .80? 

6. Is the test more powerful in general when а = .02 or when a = .11? 

7. In Figure 3-4 what represents the probability of error when the 
hypothesis is true? When it is not true? 

8. Suppose a one-sided test with critical region р =.7, .8, .9, and 1 із 
made of H : P = .50. What is the value of a? Compute the power of this 
test for each alternative value of P shown in Table 3.2. Make a graph of 
the power function of this test. 

9. Suppose a one-sided test with critical region p = 0, .1, .2, and .3 is 
made of H : P = 50. What is the value of o? Compute the power of 

"this test for each alternative value of P shown in Table 3.2. Make a graph 
of the power function of this test. 


Choice of Critical Region. Figure 3-5 displays the power func- 
tion for each of three different tests of the hypothesis P — .50. 
These tests differ as to the location of the critical region. For 
test A the regiori is located wholly in the upper tail; for test C 
it is wholly in the lower tail; for test B half of the region is in 
the upper tail and half in the lower. The power function for test 
B has already been drawn in Figure 3-4 and is reproduced here 
for the sake of comparison with those for the one-sided tests. 
The situations in which one of these three tests is more powerful 
than the others could be explained better if it were possible to 
make all three regions identical in size, that is to give all three 
tests the same level of significance. However, the discrete nature 
of the binomial distribution makes it impossible to choose a level 
of significance arbitrarily and so it is impossible to find a one- 
sided region and a two-sided region having exactly the same 
probability. Later when we work with continuous probability 
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distributions it will be possible to draw a figure similar to Figure 
3-5 but with all three curves crossing the vertical axis at the same 
point. The meaning of these graphs will be clarified by the answers 
to the questions in Exercise 3.4 and you may wish to study that 
exercise before you read the following paragraph. 


Probability of rejecting 


0.1.2.3 4 H .6 ;.,8 9 10 
Fic. 3-5. Power function of the test of hypothesis 
P = 5 for N = 10. 
А. With a = .17 and critical region р = .7, .8, .9, and 1.0. 
В. With @ = .11 and critical region р = 0, .1, -2, .8, .9, and 1.0. 
C. With a = .17 and critical region р = 0, .1, .2, and .3. 

Certain generalizations can now be made, (1) When a hy- 
pothesis is true, the probability of falsely rejecting it depends 
only on the level of significance and is not at all affected by the 
location of the critical region. The probability of an а error is 
fixed and is the same no matter whether a one-sided test is em- 
ployed or a two-sided test. (2) After the probability of type I 
error has been controlled by the arbitrary selection of the signif- 
icance level, the critical region can be so located as to secure 
maximum power against a particular alternative or set of alterna- 
tives. If the logic of the problem makes it important to reject 
Н when Р > Py and unimportant to reject when P< Py, then 
the critical region should consist of values at the extreme of the 
right-hand tail. On the other hand if there is no logical need to 
reject H when the true value is larger than the hypothesized value 
but every reason to reject it when the true value is smaller, then 
the lower tail will be critical. If deviations from hypothesized 
value in one direction are logically just as damaging to the hy- 
pothesis as deviations in the other direction, then a two-sided 
test should be used with half of the critical region in each tail. 
(3) Other things being equal, an increase in а is accompanied 
by a decrease in B and vice versa. 
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The form in which the hypothesis is stated may be used to 
indicate the location of the critical region, thus: 


Н: Р = .8, implies a one-sided test with left tail critical 
Н: Р s .8, implies a one-sided test with right tail critical 
H: P = .8, implies a two-sided test 


This practice is not universal, however, and a one-sided hypothesis 
may be written as in question 1 of Exercise 3.5. 


EXERCISE 3.5 

1. From the data of Table 3.2 find the probability of rejecting H : P = .5 
if the critical region is p = .7, .8, .9 and 1 and if P = .35, .40, .60, .70. Plot 
these probabilities on Figure 3-5. Each of the points should fall on the 
curve marked A. 

2. Do the same for the critical region р = 0, .1, .2, and .3. These 
points should fall on the curve marked С. 

3. What is the probability of rejecting a true hypothesis if test A is 
used? Test B? Test C? 

4. When the hypothesis is true, does the critical region for B provide 
a more or less satisfactory test than the critical regions for A and for C? 
Make your decision in the light of your answer to question 3. 

5. When the true probability is P = .40, estimate from the graph 

а. the probability of rejecting H : P = .50 when critical region is 

= .7, .8, .9, and 1 
b. the probability of rejecting Н: P = .50 when critical region is 
p = 0, .1, .9, and 1 
с. the probability of rejecting H : P = .50 when critical region is 
p = 0, .1, .2, and 3. 

d. the probability of an error of the second kind when Test A is used 

e. the probability of an error of the second kind when Test B is used 

f. the probability of an error of the second kind when Test C is used 

6. If P > Py, which of the three tests has the greatest power? 
Which has the least power? 

T. If P < Pu, which has the greatest power? Which the least? 

8. A research worker has tested the hypothesis P = .75 using a sample 
of 20 cases. He decides to make a as near .05 as possible. 

a. What is the critical region? Ans. From the way in which the 
hypothesis is stated, we see that the critical region should be located in the 
left tail of the binomial for which P = .75 апа N = 20. Reference to the 
row N = 20 of Table IV shows а = .041 as the value nearest to a = .05. 
This value corresponds to m = 11, or p= 9 = 45 = .55. Hence the 
critical region is р < .55. 

b. If he obtains a value of р = .40 what decision should he make con- 
cerning the hypothesis? Ans. Reject it. 
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с. In making this decision, what risk does he take of making an error 
of the first kind? Ans. Risk is less than a = .041.. 

d. In making this decision, what risk does he take of making an error 
of the second kind? Ans. None whatever, he has accepted no hypothesis 
at all so cannot possibly have accepted a false one. 

9. Answer the same four questions if the ears value had been 
p=.70. 

a. Same answer. 

b. Accept hypothesis. 

c. No risk whatever of rejecting true hypothesis. 

d. There is a risk of error of second'type but that risk depends on the. 
alternative P4 and cannot be assessed from the data presented. 

10. Below are several statements of hypotheses, with critical region 
and level of significance for each. By reference to Table IV, verify the 
correspondence of significance level and critical region. In the column 
headed Decision, record A or R to indicate whether the hypothesis 
Should be accepted or rejected. In the next column indicate the risk of 
error of the first type. In the final column, record O if there is no risk of 
error of the second type and record X if there is such risk. The magnitude 
of such risk cannot be computed from the given data. Л 
Risk of error 


pe N H. Critical a My ded Decisi Ki Sind 
type type 
1 25 Р>®Л5 ps6 071 5 — سے سے‎ 
ч ps3 сс a уд 
2 20 Р=.5 үр > 7 114 4 
3 24 Р> 25 р <.125 .115 10 سے سے س‎ 
A р > Л5 me EU. XVn 
4 24 P=5 {? < 25 .022 63 
5 22 P2 75 p S.50 010 67 سے — ا‎ 


Large Sample Tests Concerning Proportions. The approxima- 
tion of the normal distribution to the binomial discussed in 
Chapter 2 can be used as a basis for large sample tests of hypo- 
theses on proportions. Use of the normal distribution for this 
purpose means that areas under the unit normal curve replace 
sums of terms in thé tails of the distribution. For this.purpose 
the statistic 


mz 
(8.1) oe Pe 
W 


introduced in Chapter 2 is treated as a normal deviate. 
To test a hypothesis about P on the basis of a sample value 
p, making use of the normal distribution, we first choose a level 
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of significance a, say о = .05 or a= .01. Then we decide from 
the logic of the problem where the critical region should be lo- 
cated. The hypothesis is rejected whenever 


for a one-tailed test with right-tail critical, z > 21-a 
for a one-tailed test with left-tail critical, z < 2, 
for a two-tailed test, 2 < 2, or 2 > Z4. 


As an application consider the following problem: 

The proportion of all births which are male is known to be approxi- 
mately .51, a number which is generally held to be unrelated to any easily 
identified factor such as geographic location, race or economie status of 
parents. It has, however, sometimes been asserted that the ratio of male 
births tends to be higher in the years following a war or other great disaster, 
Suppose that in the 12-month period immediately following the close of 
World War II the birth records of a certain city showed 217 male births 
and only 184 female, the proportion of male births being thus 231 = .54 
instead of .51. Is this excess of male births over expectation to be inter- 
preted as evidence that some factor has been at work to increase the ratio? 


As phrased the question calls for a one-sided test of H : P = .51, 
since we wish to reject that hypothesis only if the proportion of 
male births is too high and not if it is too low. Suppose the level 
of significance is chosen to be a = .05. Then zs = 1.645 and the 
region of rejection is z > 1.645. But the observed value of z is 


54 — .51 


———— = 1 20 
fea 
401 


As 2 = 1.20 does not fall in the critical region, the hypothesis 
Р = .51 is accepted. The deviation from hypothetical value is no 
greater than may be expected to arise by chance as the result of 
random sampling. 1 

Confidence Limits for Proportions Computed from Large 
Samples. From Chart VI confidence limits can be obtained for se- 
lected values of N when the confidence coefficient is .95. How- 
ever, a chart cannot well accommodate lines for all sizes of N, 
and a separate chart is required for every size of confidence 
coefficient.. Moreover, sometimes greater accuracy is required 
than can be obtained by graphic methods. It is desirable to have 
a method for obtaining confidence limits by formula. As before 
it must be assumed that N is not small and P is not near 0 or 1, 
so the normal approximation to the binomial may be used. 
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To obtain the confidence limits we first select the confidence 
coefficient arbitrarily. The confidence limits are then derived 
from the large sample distribution of the sample proportion p. We 
know that for samples from a two-class population with given P 
the probability is 1 — o that 


B 
3.2 a 
(3.2) Zia < væ d 
N 


This inequality can be restated in the following form: 


de 


2p fe Fie (apg 5E) 
(3.3) XA A. 
(+7) 
2p += + “(400 + 5) 
<P< 


The second inequality is obtained from the first by algebraic 
manipulation which does not change the probability. These 
extremes are, therefore, confidence limits with confidence coeffi- 
cient 1 — a when р is the proportion observed in a sample. 

A good approximation to the inequality in Formula (3.3) is 
given by the inequality 


(3.4) ptu Р< pta 


Number of Cases Needed to Obtain a Confidence Interval 
of a Given Width. Suppose a polling organization has been asked 
to estimate the number of voters in a given state who are in favor 
of a particular proposition. The specifications are that the es- 
timate shall be made in the form of an interval 


C(A « P< B) = .99 


in which the width of the interval B — A = W ix not greater than 
05. By taking enough cases in their sample the agency can keep 
the interval to any predetermined value. Therefore an important 
part of their planning is the decision as to how many cases to use. 
Let us suppose that some preliminary information is available 
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from which they estimate that P will be somewhere near .75. 
They now have agreed that 


1-а = .99 
р' =.75 
W = .05 


As only a fairly rough estimate of N is required, they may begin 
work with Formula (3.4) from which 


А = р + Ze ot and В = р + 2-4 т 
so that 
Dp p U 
W-B-A af 
nia 777 
- 25 МР - - 0 
42?, Р! 
(3.5) Therefore М = “уге 
Substituting the given values yields 
2 
N= аы TUS = 1991 


On the basis of this preliminary analysis, the agency might de- 
cide to take a sample of 1991, or it might decide to ask the author- 
ities to allow it to work with a lower confidence coefficient which 
would not require so large a sample. 

Suppose a sample of 1991 cases is taken and the observed 
value of p found to be .70. As the sample is so large, Formula 
(3.4) may be used to compute the confidence interval. The ob- 
tained interval is then 


.6735 < P < .7265 


and W = .7265 — .6735 = .053 is very close to the width of interval 
which has been agreed upon as acceptable. The practical diffi- 
culty in this problem would be to obtain a random sample. If 
the sample is taken by some non-random method, say by the 
quota method often used by polling agencies, there is no assurance 
that p will have the probability distribution which is assumed in 
the formula employed. 

Sampling from a Finite Population. Some readers may have 
questioned whether the methods we have been using are appro- 
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priate if the population is finite. To illustrate with an extreme 
case, consider a population of 100 individuals of whom 60 possess 
a given characteristic, and 40 do not, so that P = .60. Now sup- 
pose samples of 99 individuals are drawn. It is intuitively clear 
that a sample of 99 individuals must be quite similar to the entire 
population of 100. Since such & sample must contain all of the 
population except one individual, there are 60 different samples 
which can be obtained by eliminating one individual which 
possesses the trait and 40 samples obtained by eliminating one 
which does not possess that trait. 


A Number of Value of 
Composition of sample different samples p 
00 cases with trait, 39 without it 40 $$ = .6060 
59 cases with trait, 40 without it _60 $$ = .5959 
100 
_ 40(.6060) + 60(.5959) _ 
Ld NT EN Oe C NEY 
40(.6060 — .60)? + 60(.5959 — .60)? 
لھ ی کی ی ی وا‎ а چ‎ АЛУАН 
et 20:60 .000024486 


This manner of computing o°, might be very laborious, but 
the result can be obtained by formula. If M represents the num- 
ber of cases in the population and N the number in the sample, 
then 


(3.6) = UM 


In the situation for which we have already computed e°, directly, 
P = .60, M = 100, N = 99, and therefore 


к (.60) (.40) 100 — 99 

99 100—1 
.24 
= 9801 = -000024487, 
which differs from the previously obtained value only in the final 


digit because of a rounding error. 


PQ. PQ M- z " 
The formula wW is a special case of == NO NESE when. M is so 


large that the fraction x may be treated as unity. 
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Sometimes samples are drawn with replacement, each individual 
being returned to the supply before the next one is drawn. In 
such a case the sampling is in effect as from an infinite universe 


and the multiplier ЭЁ — should not be used. If N is large and 


M is considerably larger than N, the statistic 

z= p-P 
(8.7) | ТРО M-N 

N M-i 

has a sampling distribution which is approximately normal. This 
statistic can be used to test hypotheses about P as was done in 
the case of sampling from an infinitely large population-using the 
statistic in formula (2.16) on page 37. 


The corresponding confidence interval for P with coefficient 
1 — а is (approximately) 


= M-N 
G9 rte VN FT SP <P he VR уст 


If it is proposed to plan an experiment so that the confidence 
interval is to be of size W with confidence coefficient 1 — æ then the 
number of cases needed is 


4Mz2',,p'q’ 
С N = pT + (TD 
For M very large these formulas differ very little from the earlier 
formulas where the population was taken to be infinite. 

Sample Size in Tests of Hypotheses. The important statistical 
problem of deciding upon the number of cases to be included in & 
sample has already been considered in relation to studies in which 
the objective is determination of a confidence interval for a propor- 
tion. A similar decision must, of course, be made when the ubjec- 
tive is a test of a hypothesis. In this type of investigation, sample 
size is determined by the necessity for minimizing the error of the 
second kind. 

The error of the first kind is controlled by selecting the level of 
significance a, which determines the risk of rejecting the hypothesis 
when it is true. The probability, 8, of accepting the hypothesis 
when some alternative is true is then controlled by the decision as 
to the size of the sample. 

Firure 3-6 shows the power curves for a test of H : P = .5 at 
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significance level а = ‚11 made on samples of three different aizes 
(N = 10, N = 20, N = 48), In the discrete binomial distribution 
only certain values of a сап be realized, The three values of N 
used here were chosen because in each of the three binomial 
distributions with P «.5 and N = 10, N = 20, and N = 48, it 
was possible to find a two-tailed critical region with approximately 
the same value of a, namely a = ‚11, 


Ро. 3-6. Power funetion for test of hypothesis P = 5 
with a = .11 for samples of айе N = 10, 20, and 48, 


If the hypothesis tested is true, all three samples entail ap- 
proximately the same probability of false rejection, that is a = ‚11, 
However if some alternative is true, such as P = ‚б, the larger 
sample provides much greater probability of rejecting the false 
hypothesis Р = ‚5, 

It will be helpful to answer several questions by examination 
of Figure 8-6, 

(a) What is the probability of rejecting H : Р = „5 when P in 
actually .6 if N= 10? If N= 20? If N= 487 (Ат: ‚18, 
.26, 42) 

(b) What is the probability of rejecting H : P = .5 when P = 2 
IN = 10? If N= 20? If N = 48? (Anawer: .68, „01, 000) 

(c) Suppose a statistician decides to accept the risk of rejecting 
the hypothesis Р = .5 in 11% of samples if it is true and a risk of 
accepting the hypothesis in 30% of samples if some alternative is 
true, that is a = ‚11 and й = ‚30, If he uses a sample of 48 cases, 
how much will P have to differ from .5 in order that he may deteet 
the diserepaney with the probability agreed upon? Solution: 
Figure 3-6 is drawn with a = .11, hence referring to that diagram 
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insures the chosen probability of type one error. If 8 ~ .30 the 
probability of rejecting must be 1 — .30 = .70. Draw a horizontal - 
line through the point marked .7 on the vertical scale. This line 
intersects the curve for N = 48 at two points for which P = ,35 
and Р = .65. Then for .35 < P < ‚65 the test will have power 
less than .70 and therefore will have probability greater than .30 
of failing to reject H : P = .50. If P is .65 or .35 the test will have 
power .70 of rejecting the hypothesis tested, and therefore the 
probability of type two error will be .30. If P > .65 or P< .35 
the power will be greater than .30 and so the probability of type 
two error will be less than .30. f 

(d) Suppose a statistician decides upon significance level 
a@=.11 and upon probability not greater than .20 of accepting 
H : P = .5 when P is actually .3. How large a sample should he 
take to insure these results? Solution: Draw a vertical line 
through P = .3 and a horizontal line through probability .8, 
(8 = 1—.8=.2.) These lines meet at a point between the curves 
N = 20 and N = 48 and nearer to the latter. Therefore, a sample 
of 20 cases will be inadequate while one of 48 cases will be some- 
what larger than is needed. 

To solve problems of this sort by reference to a chart would 
require a separate chart for each different value of œ which might 
be chosen, a separate chart for each value of the hypothesis 
tested, and on that chart a curve for each of many different valuas 
of N. As such charts are not available, and as such problems 
are very real, it will be advantageous to develop another method 
of approach, 

Suppose that a State Superintendent of Instruction is con- 
cerned with the number of children in his state who hold super- 
stitious beliefs and with devising some means of reducing that 
number. He devises a measurement instrument, and trying it 
out in the ninth grade classes in a large number of schools, he finds 
that about 70% of the pupils return answers which would indicate 
they are “very superstitious.” A committee of teachers then 
develops teaching materials which they believe will reduce the - 
prevalence of superstitious beliefs, but the teaching methods are 
somewhat time-consuming. Before they are put into general use 
it seems desirable to gain assurance that they really effect a re- 
duction in the proportion of superstitious pupils. 

Therefore it is planned that а new sample of ninth grade pupils 
shall be taught by the new materials and then given the same 
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test on which previously 70% of the former sample were rated 
“very superstitious,” The research worker conducting the study 


method is no better than the old 
Р z.70, As this is to be 

than .70, the critical region is in the left tall. (2) To limit 
risk of adopting the new method if it ia not superior to 


risk of not recognizing the superiority of the new method if 
superiority exists, they decide that 
or smaller, they are willing to take only a 10% risk of 
covering that superiority, That is 6 = .10 so the 
test against the alternative P = 760 will be .90, 
Having agreed upon these requirements they can solve 
question as to what size of N will be required, It must be assumed 
that the sample is large enough to treat p as normally distributed 


with moan Р and standard error (TP. There are two distri- 


tributions (о be considered, one distribution under the hypothesis 
Py = .70 and another under the alternative P4 = ‚00, These two 
distributions are shown in Figure 3-7, 


Ё 
ЧЕР 
i ЕЁ 
Ё Ffeisks 


„=. 
Vio. 3-7, The distribution of p under the hypothesis Py = .7 and under 
the alternative Pa = 6, 
Critical region under I ts baseline to helt of C 
нө! hy shaded aren to left of С under curve specified hy 

we, 

- = to right of C under curve specified by alternative P = 6 
Foverel tah Ала de d eh ҖАН 


Since a = .05, we have from the curve at the right 


габы 
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and since B = .10, we have from the curve at the left 
C -.6 


When the numerical values of zu = — 1.645 and 2% = 1.282 are 
substituted in these expressions, each one is solved for C and the 
results are equated, we get the equation 


21 1 /.24 
T = 1.645 N =; 6 + 1.282 WV 


Solving this equation for N gives 
2 
NIŻ Н 21 + 1282y 2) = 191 


as the estimated sample size necessary to meet the conditions 
agreed upon. 

To make the procedure general, let 
Pu = the proportion under the hypothesis to be tested 
Р, = the proportion under the alternative 
а = significance level = probability of rejecting H when it is true 
8 = probability of accepting H when the alternative is true 


-aV PuQu + 21-۷ PAQ, y 


(3.10) Then N = { Post, 


EXERCISE 3.6 

1. Ina random sample of 400 high school students in city A, 168 said 
in a test of general information that Traq is the capital of Iran. Using a 
confidence coefficient of .99, make an interval estimate of the proportion 
of high school pupils in the entire city who would make the same mistake, 

2. A superintendent of schools has stated that at least 60% of high 
school seniors expect to attend college. In a random sample of 200 cases, 
only 96 say they are planning for college. Does this refute the superin- 
tendent's statement? (Use a one-sided test with a = .02.) 

3. How large a sample will be needed to test the hypothesis that 
Р < .30 at significance level .02 if it is agreed that the test shall have power 
:80 when Р = .35? 

4, Answer the same question if it is agreed that the test shall have 
power .90 when Р = .50. 

5. How large a sample will be needed to test the hypothesis P < .50 
at significance level .01 if it is agreed that the test shall have power .90 
when P = .52? 
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Test of Hypothesis That Two Population Proportions Are 
Equal When Each Is Estimated from a Large Number of Ob- 
servations and When the Estimated Common Proportion Is Not 
Near 0 or 1. In a public opinion poll taken in Peekskill, N.Y. 
in 1946, Hedlund * compared two methods of selecting a sample 
of persons to be interviewed on certain matters concerning the 
Peekskill public schools. The Control Sample consisted of 97 
adults selected at random by interviewing one person at every 
38th address listed in the street guide of the latest Peekskill 
directory. The Experimental Sample consisted of 365 persons 
selected by high school students from among their adult acquaint- 
ances. The method of selecting individuals for the experimental 
sample is obviously more economical. However, if the propor- 
tion of persons reported to hold a given opinion differs significantly 
from one sample to the other, the-selection of the experimental 
sample must be presumed to involve bias. 

One of the questions asked by Hedlund was, “Would you be 
in favor of spending public money to add an evening school for 
adults to the services now offered by the Peekskill Public Schools?” 
The responses were obtained from populations of persons which 
may be the same or different in respect to their opinions on this 
issue. One may formulate the hypothesis that adult acquaint- 
ances of high school students have the same attitudes toward an 
evening school as the population of adults as a whole. 

There are two populations. One consists of the responses 
which might have been obtained by questioning all adults in 
Peekskill, and the other consists of responses which might have 
been obtained from all acquaintances of high school students in 
this city. Both are finite populations but large enough for sam- 
pling distribution theory based on an infinite population to apply. 
The mathematical description of the populations can be formu- 
lated as in the adjacent display. 


Class All adults АП acquaintances 
Answering “Yes” Pi Р, 

oF “ ” 
Answering “No” or Qi Q 


“No opinion” 3 
The question to be asked about the two populations might be 
phrased: Is the proportion of “Yes” responses the same in both? 
Mathematically the hypothesis can be formulated as P; = Р, = P 
oras P; — Р, = 0. A hypothesis which states that the difference 
between two paiameters is zero is called the null hypothesis. 
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Level of significance and location of critical region must be 
decided before data are examined. Let us assume that a is to be 
.05. The logic of the problem appears to call for a two-sided test, 
because we want to detect a difference in either direction. 

Let М, be the number of cases in the sample for which the 
observed proportion is p; and let Nz and p» be the corresponding 
values in the other sample. Then in the combined samples the 
proportion of “ Yes" answers is 

Мур, + Nop: 
(3.11) BEGUN. T 


and the proportion not answering "Yes" is g=1-—p. Let 
N = № + №. The required statistic is 


(3.12) n ud 


As before, z denotes a variable with unit normal distribution. 

Formula (3.12) requires an unnecessary amount of arithmetic 
and risk of unnecessarily large rounding errors. 

Let n, be the number of individuals answering “Yes” in the 
first sample and n: be the number of individuals answering “Yes” 
in the second 

n = т +M 
N=Nı +N: 


Then using the integers listed above instead of the decimals p, 
and p», Formula (3.12) can be reduced to 


(3.13) peg Db 3 


т | /NiNen(N = п) 
TES ROER 


These two formulas will now be applied to the data from Hedlund’s 
study. Both the numbers and the proportions in the two classes 
are shown. 


Class Number Proportion 
Adults Acquaint- Both Adults Acquaint- Both 
ances апсев-. 
Those answering “ Yes" 62 241 303 .6392 -6598 .6558 


Those not answering “Yes” 35 124 159 .3608 3402 .3442 
Entire group 97 365 462 1.0000 1.0000 1.0000 
ee ea фе А i DIU II sro Ше ME Curl os Die Cerca i ae iret 
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By Formula (3.12) 


„6392 — 6598  — 0200 _ — agg 


= fC6558)(3442)(462) -0534 
(97)805) — 
(124)(62) – (85)(241) _ _ 388 


4 /(97) (365) (303) (159) 
462 


As the numerator and denominator of the first computation are 
carried to 3 significant digits only, an error must be expected 
in the third digit of the result. 

For large samples the statistic given in Formulas (3.12) and 
(3.13) has a distribution which is approximately normal. If 
neither N, nor N» nor n пог N — n is small the normal approxima- 
tion will be satisfaetory. A correction to be used in small samples 
will be found in Chapter 4 on page 106, where a statistic closely 
related to this one is discussed. 

Since it was agreed that а should be .05 we read zs from the 
normal probability table as 1.96 and 2» as — 1.96. 'The region 
of acceptance is therefore — 1.96 « z « 1.96. The observed value 
z = — .389 falls well within this region. The observed difference 
in percentages is not large enough to throw any serious suspicion 
on the hypothesis P; — Р, = 0. Inasmuch as one method of tak- 
ing a sample is much easier than the other, and inasmuch as no 
evidence has been found to indicate that the methods produce a 
difference in proportions, the investigator is justified in using the 
easier method of drawing the sample. 


By Formula (3.13) z= 


REVIEW EXERCISE 
1. The following terms have been introduced in this chapter. Be sure 
that the meaning of them is clear to you. 


acceptance value interval estimate 
alternative to a hypothesis level of significance 
confidence belt null hypothesis 
confidence coefficient ` one-sided hypothesis 
confidence interval power function 
confidence limits power of a test 
consistent estimate region of acceptance 
critical region region of rejection 
error of the first type region of significance 
error of the second type rejection value 


finite population - sample point 
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size of a region two-sided hypothesis 

standard error unbiased estimate 

test 
2. Translate into words each of the following symbolic expressions: 
a. Н:Р= 3 d. P(p = .7 | P = 6 and N = 40) =? 
b. E(p) =P e. C(.43 < P < .49) = .98 

PQ 

с. сё = TF 


3. For each of several situations there is given the hypothesis to be 
tested, the observed value of the statistic, the region of acceptance. Draw 
a circle around the letter A or R to indicate whéther the decision should 
be to accept or reject the hypothesis. Draw a circle around the number I 
or II to indicate which type of error you are certain has not been made. 


5 Type of error 
H р тед Decision which has not 
. RP been made 
Р = Л5 82 425 < p < 775 AR IS IT 
P = 40 31 p < .50 AR TIT 
Р = 50 .59 44 <р < .56 AR ‘pug 
P > 80 78 p> ЛА AR BSID 
P = 42 Al A0<p< 44 AR EI 
REFERENCES 


1. Clopper, C. J. and Pearson, E. S.,.* The Use of Confidence or Fiducial Limits 
Illustrated in the Case of the Binomial,” Biometrika, 26 (1934), 404-413. 

2. David, F. N., Probability Theory for Statistical Methods, 1949, Cambridge Uni- 
versity Press. Chapter 1, Fundamental Ideas. Chapter 2, Preliminary Defini- 
tions and Theorems. Chapter 3, The Binomial Theorem in Probability. Chapter 
4, Evaluation of Binomial Probabilities. 

3. Eisenhart, C., Hastay, M. W. and Wallis, W. A., Selected Techniques of Statistical 
Analysis, New York, 1947, McGraw-Hill Book Company, Ince., 331-335. 

4. Hedlund, Paul A., Measuring Public Opinion on School Issues, unpublished 
Ed. D. thesis, 1947, on file in the Library of Teachers College, Columbia Uni- 
versity. 

. National Bureau of Standards, Applied Mathematics Series 6, Tables of Binomial 
eros Distribution, Washington, D.C., 1949, U.S. Government Printing 

ice. 

. Neyman, Jerzy, First Course in Probability and Statistics, New York, 1950, 
Henry Holt and Company. Chapter 2, Probability. 

7. Walker, H. M., Mathematics Essential for Elementary Statistics, New York, 

1951, Henry Holt and Company, 2d ed. Chapter 19, The Binomial Expansion. 

Wilks, S. S., Elementary Statistical Analysis, Princeton, 1949, Princeton Uni- 

versity Press, Chapter 10, Confidence Limits for Population Parameters. 

Chapter 11, Statistical Significance Tests. 


e 


e 


go 


4 Chi-square 


The simplest type of population — and a very important 
type —is the dichotomy discussed in the preceding chapters. 
Many of the most fundamental concepts of statistical inference 
have already been developed in relation to this simple two-class 
population and its parameter P, and will now be applied to prob- 
lems from other types of population. In this chapter we shall 
study populations composed of several discrete classes. In later 
chapters we shall deal with populations on one or more continuous 
variables. A greater variety of questions can be asked — and 
answered — about such populations. Samples from them will 
furnish new statistics and some of those statistics will have sam- 
pling distributions other than the binomial and the normal distribu- 
tion, so that new probability tables must be employed. However 
we shall still be concerned with such basic ideas as the sampling 
distribution of a statistic, its expectation and its standard error, 
critical region, significance level, two types of error, and tests of 
hypotheses. 

Populations Consisting of Several Discrete Classes. Such 
populations are very common. Again we shall introduce the topic 
with a problem in which there are so few alternatives that the 
probabilities can be obtained by direct enumeration. The illus- 
tration we shall use is a study by an advertising agency to deter- 
mine which of three color tones is most desirable for a certain 
display. In order to ascertain the reaction of the public, the 
display is prepared in each of the three colors and is shown to a 
sample of persons. Each person is asked to state which of the 
three displays he prefers. 

The statistician in charge of the study would probably select 
a large sample of persons. A procedure for testing hypotheses 
with large samples will be described later. However, in order 
to provide a background for that procedure a calculation will 
first be carried through on the assumption that only three judges 
are used in the study. 
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The possible responses of the 3 judges may be classified in 
terms of the amount of agreement thus: 

1. All three judges may vote for the same display. For con- | 
venience we shall call this type of response 3, 0, 0. Since the 
display for which they vote may be either I, II, or III, there are 
3 ways in which they can make this type of response. 

2. Two judges may agree on one display and the third judge ` 
vote for a different one. For convenience we shall call this type | 
of response 2, 1, 0. It can be made in 18 different ways, which 
are tedious but not difficult to enumerate. A few of these 18 ways” 
are listed here, the letters A, B, and C representing judges, and 
the numerals Т, II, and III the displays for which they vote 


Response I II III 
1 ‘A — BC 
2 B AC =n 
3 AC — B 


3. Each judge may select a different display. We shall call 
this type of response 1, 1, 1. It can be made in 3! = 6 different 
ways. 

The three types of response and their probabilities under the 
hypothesis that a randomly selected judge is as likely to vote for _ 
one display as for another, are therefore: 


Type of Number of ways in which Probability under 


response response can be made the hypothesis 
TTL 6 .222 
2,1,0 18 .667 
3,0,0 3 111 
1.000 


Assume now that the three judges are chosen randomly and _ 
cast their votes independently and that all 3 vote for display II. 
Can the advertising agency be fairly sure that the public in general 
will prefer II? Could the agreement be a matter of chance? Let 
us test the hypothesis that in the population preferences are 
equally distributed so that a randomly selected judge is as likely 
to vote for one display as for another and consequently any of | 
the 27 possible responses is as likely as any other. Then the 
probability that all three judges will select the same display is 
3/27 = .111. Since the highest agreement possible among 3 judges 
has so large a probability of occurring by chance in the absence 
of any real preference on the part of the general population, it 
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must be concluded that 3 judges are not enough to reach any 
dependable conclusion. Let us therefore consider the same prob- 
lem with 6 judges. 

Sampling Distribution of Response Types. We found there 
were 3° = 27 possible ways in which the votes of 3 judges could 
be assigned to 3 displays, and that these 27 ways could be classified 
under the three types 3, 0, 0; 2, 1, 1; and 1, 1, 1. With 6 judges 
there will be 3% = 729 possible ways in which their votes can be 
allocated to the three displays. 

Assume that the six judges are chosen randomly, and that of 
the six, four choose display I, one chooses display II and one 
display III. On the basis of these frequencies the statistician 
wishes to test the hypothesis that in the population, from which 
the judges are a sample, the preferences are equally distributed. 

The 729 possible responses can be classified into the 7 types 
shown in the first column of Table 4.1. The number of different 


TABLE 4.1 The Number of Possible Responses of Six Judges Each of Whom 
Chooses any one of Three Displays Classified by Type of Response under the 
Hypothesis that One Display is as Likely to be Chosen as Another. 


Number of Probability Cumulative 
Type responses in type of type probability 
2,2,2 90 .1235 .1235 
3,2,1 360 4938 6173 
3, 3,0 60 0823 6996 
4,1,1 90 .1235 8231 
4, 2,0 90 1235 9466 
5, 1,0 36 0494 .9960 
6,0,0 3 0041 1.0001 

'OTAL 729 1.0001 


responses giving rise to each type is shown in the second column 
of that table. The student is not expected to verify these num- 
bers but may be interested in knowing how they were obtained. 
Consider the type, 3, 2, 1. Three out of 6 judges vote for one 


of the displays and these 3 judges can be selected in (3) = 20 ways. 
Two out of the remaining judges vote for another display and they 
can be chosen in () = 3 ways. The remaining judge will then 


vote for the remaining display. The number of ways of grouping 
votes of the 6 judges is therefore 20 x 3 x 1 = 60. However the 
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votes of 3, 2, 1 can be assigned in 6 different ways to the 3 dis- 
plays, and therefore the total number of possible responses which 
produce this type is 6 x 60 = 360. The other frequencies аге 
arrived at in similar fashion, though it is not particularly impor- 
tant for the student to compute them. 

Under the hypothesis that out of all judges who could be con- 
sulted the same proportion would vote for each display, so that a 
randomly selected judge is as likely to vote for one display as for 
another, the expected type of the votes of 6 judges is 2, 2, 2. In 
Table 4.1 the 7 types have been arranged according to the degree 
of their conformity to this hypothetical type. The first type 
shows complete conformity, the last one, 6, 0, 0, shows the greatest 
possible diserepaney. In the next section will be presented a 
statistic by which that diserepancy can be measured. Clearly 
one would be quite ready to reject the hypothesis if a sample 
yields the type 6, 0, 0, since the risk of error in falsely rejecting 
a true hypothesis would then be only .004. If the observed type 
were 5, 1, 0, the probability that this type or one showing even 
less agreement with the hypothetical frequencies should occur 
when the hypothesis is true is .049 + .004 = .053, and most in- 
vestigators would reject the hypothesis on the evidence of such 
data. 

In the problem as stated, the judges' votes were 4, 1, 1. There 
are 3 types of response which disagree with the hypothesis more 
than this one and the sum of the probabilities for these three is 
.177. The probability, under the hypothesis, of the observed 
pattern or something still more unlikely is .124 + .177 = .301. 
Clearly on the evidence from this response the advertising agency 
cannot assume display I to be more pleasing in general than dis- 
plays II and III. 

The calculation just described is quite laborious for six cases. 
In a more realistic situation, where many more than six cases 
would be studied, the calculation would become utterly forbidding. 
It is customary in such problems to compute a statistic which 
measures the discrepancy between observed and hypothetical 
frequencies and to study the sampling distribution of that sta-. 
tistic. The statistic is known as x°, or chi-square. 

Chi-square as a Measure of Discrepancy between Observed 
and Expected Frequencies. For convenience in writing formulas 
we shall adopt the following symbolism: 

/; is the number, or frequency, observed in the ith class, 


Chi-square as a Measure of Discrepancy - 85 


F; is the number, or frequency, expected in the ith class in 
accordance with the proportions indicated by the hypothesis, 

The expected frequencies are computed on the assumption that 
the cases in the sample are apportioned according to the hy- 
pothesis so that the total expected frequency is the same as the 
total observed. The three displays define three classes over which 
the judges’ votes are distributed. The frequencies expected under 
the hypothesis are then F; = 2, F; = 2, and F; = 2, while the cor- 
responding observed frequencies are fı = 4, fo=1, and fs = 1. 

The sum of all the differences f; — Ё; will always be 0 since 
Уу; = EF, (The reader unfamiliar with the symbol Z should refer 
to page 111.) Therefore Z(f; — F;) cannot be used as a measure 
of discrepancy. If these differences are squared before summing, 
the negative signs will conveniently disappear. The size of 
(f; — Е.)° needs to be taken in relation to the expected value F; 
inasmuch as a difference of, say 5 when F; was, say 10 would be 
much more important than a difference of 5 when F; was, say 
100. ‘(The formula was of course arrived at mathematically and 
not by such intuitive argument as is advanced here.) Then the 
statistic 

ste inm + Gi Р) ye H= Ез)? 
x EE a Р; 


ог in general 
2 
(4.1) “= 5 hee F; 2 


measures discrepancy between observed and expected frequencies, 
The computation is then as follows: 


Class fi Fi fi- F: (f; Р)? (fa — F2)°/F: 
I 4 2 2 4 2.0 
Il 1 2 =1 1 5 
ш 1 2 -1 1 зе 
Sum 6 6 0 3.0 = x? 


The Greek letter x? (or chi-square) was first used to describe 
this statistic by Karl Pearson in 1900. Its use is practically 
universal. In fact it is one of the very few statistical symbols 
used by almost every writer. There is no parameter corresponding 
to this statistic. 

Chi-square is a discrete variable. It is never negative, since 
each term in the numerator is a square and each denominator is 
a positive number. If the observed frequencies should agree com- 
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pletely with the hypothetical, x? would be zero. x? increases in © 
size as the observed frequencies depart more and more from the 
hypothetical. 

The reader should compute x? for each of the response types 
in Table 4.1 in the manner shown above for type 4, 1, 1 and 
compare his results with the values shown in Table 4.2. 

Exact Probabilities of Chi-square Obtained by Enumeration. 
In order to decide what to do with the hypothesis we are testing 
it is necessary to set up a critical region. In order to set up a 
critical region for x? it is necessary tọ know its sampling distribu- 
tion. We already have, in Table 4.1, the probability distribution 
of response type under the hypothesis that all possible responses 
are equally likely. However, response type is not a statistic which 
has any general usefulness, whereas x? сап be used in a very wide 
variety of situations to measure the discrepaney between observed 
and expected frequencies. We shall therefore convert the proba- 
bility distribution for response type of Table 4.1 into the exact 
sampling distribution for x°, merely by computing the value of 
x? corresponding to each response type. The two response types 
4, 1, 1 and 3, 3, 0 are then found to yield the same value of chi- 
square and во have been combined in Table 4.2. The probabilities 
in Table 4.2 are those already given in Table 4.1. The cumula- 
tive probability in the final column is the probability of obtaining 
by chance a value of chi-square which does not exceed the value 
listed in the x? column. 


TABLE 4.2 Values of X? Measuring Discrepancy between a Response Type and 
the Type 2, 2, 2, with Corresponding Probabilities and Cumulative Probabilities. 
(Probabilities as in Table 4.1.) 


Pe Cumulative 
Type x Erabahility probability 
А 2,2,2 124 da 
B 321 1 494 ‘617 
€ 411 3 
ку б . 206 823 
D 4,2,0 4 424 947 
E 5,1,0 7 049 ‘996 
F 60,0 12 ‘004 1.000 


Chi-square Curves and Table. To obviate the detailed and 
laborious computation required to determine the distribution of 
X^ as indicated in Tables 4.1 and 4.2, а set of smooth curves is 
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Fig. 4-1. Exact and smooth x? distribution for samples of six cases 
grouped in three classes, 


available, and probabilities calculated from areas under these 
curves may be used as approximations to the exact x? distribu- 
tions calculated by enumeration. The approximation is very 
accurate for large samples. Good results from the use of these 
curves may be obtained when the expected frequency in each 
class is at least five. Figure 4-1 shows the relationship between 
x? distribution as calculated by enumeration and the smooth curve. 
The exact probabilities in this figure are obtained from Table 4.1, 
and the smooth curve from the mathematical formula for the curve. 
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Fic. 4-2. Exact and smooth cumulative x? distribution for samples of 
six cases grouped in three classes. 
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The relation of the smooth curve and the step curve can be 
seen more clearly in the cumulative distributions of Figure 4-2, 
The smooth curve gives a better approximation to the exact proba- 
bilities when N is large than when N is small. To illustrate this 
fact, a computation was made of the probabilities attaching to 
the various values of x? when 12 instead of 6 individuals are 
distributed in three classes. These computations are similar in 
nature to those reported in Tables 4.1 and 4.2. The figures will 
not be given here but the resulting cumulative probability values 
are shown graphically in Figure 4-3 with the same smooth x? curve 
which appears in Figure 4-2. The smooth curve depends upon the 
number of classes and not on the number of individuals in those 
classes. In Figure 4-2 the steps are larger and so the probabili- 
ties read from the smooth curve and the step curve show greater 
disagreement than in Figure 4-3. The smooth curve provides 
probabilities which approximate fairly well the exact probabili- 
ties of the step curve in Figure 4-3 for probability values of 
-90 or more, and it is in the range of cumulative probabilities 
from .90 to 1.00 that the accuracy of approximation becomes 
important. As N increases, the exact probabilities come in general 
closer and closer to the approximate values read from the smooth 
x? curve. 


Cumulative Probability 


0 1 2 3 4 5 6 7 8 9 10 11 12 
Scale of x? 


Fra. 4-3. Exact and smooth cumulative x? distributions for samples of twelve 
cases grouped in three classes. 
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The burden of studying Hypotheses in populations consisting 
of several classes can now be shifted to probabilities calculated, by 
use of the calculus, from the smooth x? curves. These proba- 
bilities are tabulated in Table VIII in the Appendix. 
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Fia. 4-4. Smooth x? curve for 4 degrees of freedom, with selected 
percentiles indicated on the baseline. 


We have already seen that what is called “the binomial dis- 
tribution” is actually not one distribution but a whole family 
of distributions, the members of that family differing from each 
other as P and N change. The chi-square curve is also not a 
single curve but a family of curves. In Table VIII each horizon- 
tal row relates to one particular curve in this family of smooth 
curves. Subscripts of x? at the top of the table indicate per cents 
of area under the curve. The relation of these per cents of area 
to the tabular entries is illustrated in Figure 4—4 which shows 
the curve for the horizontal row n = 4. Select one of the columns 
in Table VIII, say the one headed x? ло, and read the tabular entry 
in the given row and column. It is 1.1. The relation between 
x?.10, and 1.1 may be expressed in any one of the following ways, 
all of which have the same meaning, but with slightly different 
emphasis: 

10% of the area under this curve lies to the left of an 

ordinate at x? = 1.1 
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P(X? < 1.1) = .10 

P(x? > 1.1) = .90 

The 10th percentile of this distribution is x? = 1.1 
Хо = 1.1 ; 


The rows of Table VIII are distinguished by entries in the 
column headed л. These values of л are known as degrees of 
freedom. As each row relates to one particular curve in the set 
of smooth curves, the form of the curve is seen to be determined 
by its degrees of freedom. 

Degrees of Freedom. The term “degrees of freedom” will . 
occur again and again, not only in connection with chi-square 
but also in a great variety of other problems. It will be worth 
while to pause now for clarification of that concept. 

Suppose you are asked to write 3 numbers with no restrictions 
upon them. You have complete freedom of choice in regard to 
all 3. There are 3 degrees of freedom. 

Now suppose you are asked to write 3 numbers with the re- 
striction that their sum is to be some particular value, say 20. 
You cannot now choose all 3 freely, but as soon as 2 have been 
chosen the third is determined. Your choices are governed by 
the necessary relation X, + X; + X; = 20. In this situation there 
are only 2 degrees of freedom. The number of variables is 3. 
but the number of restrictions upon them is 1, and the number 
of “free” variables, or independent choices, is 3 — 1 = 2. 

Now suppose you are asked to write 5 numbers such that their 
sum is 30 and also such that the sum of the first two is 18. There 
are 5 variables but you do not have freedom of choice with re- 
spect to all 5. You cannot write 5 numbers arbitrarily and have 
them conform to the 2 restrictions that X; + X = 18 and 


Xi Xs + X; +X, +X, = 30. 


As soon as you select Xi, then X; = 18 — X, and is completely 
determined. Since X; + X; + X, = 30 — 18 = 12, only two of the 
numbers, X; X; and Xs, can be freely chosen. As one of the 
numbers X; and X, can be freely chosen there are 3 free choices. 
The number of degrees of freedom is n = 5 — 2 = 3. 

In every statistical problem in which degrees of freedom are 
involved it is necessary to determine the number of “free varia- 
bles" by first noting the total number of variables and reducing 
that number by the number of independent restrictions upon 
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them. In the preceding paragraph, for instance, one might think 
there are 3 restrictions, namely 

Xi +X: = 18; X; 4- X44- Хь = 12; and Xi + X: + X:+ X4 + Xs = 30. 
However only two of these are independent, since any one of them 
can be deduced from ihe other two. (For further discussion of 
degrees of freedom, see references 11 and 12.) 

Returning now to the problem of the advertising agency we 
note that the responses of six persons are distributed in three 
classes. Let Ху, X» and X; be the numbers in those classes. 
There are thus 3 variables. These are restricted by the relation 
X,+ Х»+ Х; = 6. Only two of the variables can be chosen freely ; 
two are free and the third is bound. (Do not be troubled by the 
fact that you are not free to choose a negative number, or a frac- 
tion, or a number larger than 6 in this problem. Even though 
choice of a variable can be made only from the digits 0, 1, 2, 3, 4, 
5 or 6 the variable is still considered free.) If instead of 6 judges 
there had been 1000 we should have had X, + X:+ X; = 1000 
and the number of degrees of freedom would still be л = 3 — 1 = 2. 
Thus for all problems testing a hypothesis about the proportion 
of cases falling in each of 3 classes, when no restriction is laid 
down except that giving the total number of cases (here 6), the 
chi-square table may be entered with 2 degrees of freedom. In 
general, if there are k classes in such a problem there would be 
k — 1 degrees of freedom, regardless of how many cases are dis- 
tributed in the k classes. Other types of problem with a variety 
of numbers of restrictions and of degrees of freedom will be con- 
sidered in succeeding sections. 

Reading the Chi-square Table. In Figure 4-5 are shown five 
of the smooth x? curves for which probabilities are listed in Table 
VIII namely, the curves for which n — 1, 2, 4, 8, and !0. It may 
be helpful to consider the table and this figure together. Select 
one of the curves, say the one for which n = 3. Now find those 
entries in row n — 3 of Table VIII which may be interpreted sym- 
bolically to mean the following: 

Р(х? < .35 |n = 3) = .05 

P(x? < 6.3 | п = 3) = .90 
Locate .35 and 6.3 on the baseline of Figure 4-5 and draw an 
ordinate at each point. Figure contains no curve for n = 3 to see 
that the proportion of area under the curve to the left of these 
ordinates appears to be approximately .05 and .90 respectively. 
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Fic. 4-5. x? curves for 1, 2, 4, 8, and 10 degrees of freedom. 


It is very convenient to remember that E(x = п. This is 
almost the only guidepost which most people carry in their minds 
with respect to the x? distribution. It means that if one obtains 
an observed value of x? smaller than the number of degrees of 
freedom involved in a problem, one usually feels satisfied to decide 
at once, without consulting the probability table, that the discrep- 
ancy between hypothetical and observed frequencies is negligible. 


EXERCISE 4.1 
1. Verify the following statements by reference to Table VIII and 

translate them into words: 

. Р(х < 99|n = 21) = .02 

. P(x? < 43.2| n = 27) = .975 

. Р(№2<27|һ =1) = .90 

. P(x? > 30.6 |n = 15) = .01 

P(x > 15 |n = 6) = .02 

P(x? > 8 | п = 16) = .95 

Р(х? > 40.1 |n = 27) = 05 

. P(146 < x? < 37.7 | п = 25) = .90 
Р(74 < x! < 40.0! n = 20) = .99 
Р(8.5 < x? < 22.3 jn = 15) =.80 
01 < P(x? < 14| п = 28) < .02 
05 < P(x? > 28| п = 19) < 10 

. 005 < P(x? > 42 | n = 22) < 01 
01 < P(x? > 42|п = 24) < .02 
025 < PG? > 20 |n = 11) < 05 
P(x? < 3.2 | п = 13) < .005 

. PO > 32 |ъ = 10) < .001 
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2. Suppose you have decided that whenever the observed x? is equal 
to or greater than x?.s; you will reject whatever hypothesis you are testing 
and whenever it is less than X? а; you will accept that hypothesis. For each 
of the situations described below, read the appropriate value of x?.95 from 
the table and circle A or R to indicate whether the hypothesis under 
consideration should be accepted or rejected, as illustrated in a. 


n Xt X? 95 n x? Хә 
a 25. 37, КО) Е DB yuh АБУ уй е АК, R 
b... AY EE OT E SOM AG ТОКА 1R 
c. T. ES ЛОВОМ E WAL UU: 
d. поо ВАЕ itu DO Mines AR 
e. 15. Gli p ОЛАК ААК cu vt esp Rente M WA 


3. Carry out the instructions for the preceding questions but using the 
.01 significance level (that is, refer to Х?,») instead of the .05, Compare 
the decisions as to disposition of the hypothesis reached in the two situa- 
tions. 


Comparison of Observed and Theoretical Frequencies by 
Means of x?. The problem of testing a hypothesis regarding the 
proportion in the classes of a population has been discussed at 
length for a small sample. Now that the x? table is available, 
problems involving a larger number of cases can be considered. 

Suppose that in the study of preferences for advertising dis- 
plays 60 cases have been used instead of 6. The hypothesis to 
be tested is still that the proportions in the population are equal. 
Then the 60 cases are expected to be distributed equally in their 
preference. Hence F, = Р, = F; = 20. Suppose that the observed 
frequencies are 30, 18, 12; so that f; = 30, f = 18, fs = 12. Sub- 
stituting these values into Formula (4.1) we compute the value of 
chi-square as 


(30-20)? , (18-20) , (12-20)? _ 
e eae gee he Га re 


The x? table shows that this value leads to rejection of the hy- 
pothesis at both the .05 and .02^levels, but to its acceptance at 
the .01 level. 

Problems leading to a comparison of observed with theoretical 
frequencies arise when one wishes to determine whether an ob- 
served sample has been drawn from a population having some 
theoretical form such as the normal. This problem will be taken 
up in Chapter 5. 
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EXERCISE 4.2 

1. Several situations will be described briefly. No computations are 
to be made, but in each the task is to decide upon a number denoted n 
which is the number of degrees of freedom and upon another number 
denoted N which is the number of observations. 

(a) In a college class of 10 students, each student is asked to appraise 
a text as either excellent (E), good (G), mediocre (M), or poor (P), and 
their appraisals are to be used to test the hypothesis that-college students 
in general would show judgments evenly divided in the 4 categories. For 
text A the appraisals were E, 0; б, 5; M, 4; P, 1. Then n= —, 
N = —. 

(b) Let the situation be the same as in question (a) except that there 
are 50 students whose appraisals of text A are E, 3; G, 31; M, 12; P, 4. 
Then n = —, N = —. 

2. If the exact and the smooth x? distributions were fitted to the data 
of questions (a) and (b) in the preceding example, do you think the fit of 
the smooth curve would be better in one situation than in the other? 
If so, in which? 

3. In a toss of 4 pennies the distribution of number of heads falling 
upward has the following probabilities, obtainable from the binomial 
distribution : 

Number of heads 0 1 2 3 4 
Probability vs i i i й 

Using these probabilities, what аге the expected frequencies in the буе 
classes, if the throw of four pennies is repeated 64 times? 

What is x? if the observed frequencies in 64 throws are 
Number of heads 0 1 2 3 4 
Observed frequency 2 12 23 20 7 

Does the value of x? warrant a judgment that one or more of the 
pennies are biased? 

4. Answer question 3 if the pennies were tossed 128 times and observed 
frequencies of heads were 4, 24, 46, 40, and 14. The problem here is to 
noté what happens to x? when every f; and every F; is doubled. 

5. Answer question 3 if the pennies were tossed 64r times and the 
observed frequencies of heads were 2r, 12r, 23r, 20r, and 7r. Make а 
generalization as to effect on x? of multiplying observed and theoretical 
frequencies by r, keeping proportions unchanged. 

Computing Chi-square from Data in Form of Per Cents. 
When data are given in per cents rather than frequencies it is 
convenient to compute x? by the formula 

2 
(4.2) xt = NE Pie Рд" 
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where p; is the observed proportion in the ith class and P; is the 
expected proportion. If only per cents are given and the total 
sample size N is unknown x? cannot be computed.. To use per 
cents as though they were frequencies would be to proceed as 
though N were 100. If N is less than 100 this would give too 
large a value of x? and might lead to false rejection of a true 
hypothesis more often than is indicated by the level of significance. 
If N is greater than 100, x? would be underestimated and the hy- 
pothesis accepted more often than should be the case. 

Tests of Independence in Contingency Tables. The fore- 
going diseussion has dealt with tests of hypotheses that the popu- 
lation proportions in several classes are in agreement with certain 
hypothetieal proportions. We shall now consider situations in 
which individuals are classified according to two discrete varia- 
bles, and the problem deals with the relationship between those 
variables. 

As an example of such a situation consider data taken from 
Rope’s study of Opinion Conflict and School Support.’ Each of 
1464 adult residents of Pittsburgh, Pa., was interviewed on a 
variety of issues related to tax support. One of the questions 
asked was, “Do you think tax money should, or should not, be 
spent on nursery schools for children less than four and a half: 
years old?” Responses were classified as (1) favorable, (2) no 
opinion, and (3) unfavorable. The problem was to determine 
whether there is a relationship between type of response and age 
of respondent. Each individual was then classified both according 
to type of response and according to age. The resulting distribu- 
tion appears in Table 4.3. 


TABLE 4.3 Frequency of Observed Response to the Question of Spending Tax 
Money for Nursery Schools, Classified According to Nature of Response and Age 
of Respondent.* 


Group aged Group aged Group Total 


20-34 34-54 over 54 
Favorable response 153 182 65 400 
No opinion 35 50 25 110 
Unfavorable response 377 417 160 954 
TOTAL 565 649 250 1464 


* Taken from Rope, Opinion Conflict and School Support. 


A two-way distribution like that in Table 4.3 in which the 
categories are discrete is called a contingency table. To study the 
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problem of relationship between the variables, the hypothesis of 
independence is formulated. This hypothesis is then tested by 
an application of the x? method. 

The methods of dealing with the hypothesis of independence 
will be developed in its most general form for a contingency table 
with more than two classes in each of the variables. While these 
methods are applicable to all two-variable contingency tables 
simplifications of the computational techniques are available for 
contingency tables in which (1) one of the variables has two 
classes and the other more than two classes, and (2) each of the 
variables has two classes 

a. More than two classes for each variable. This situation is 
illustrated by the data in Table 4.3. If the hypothesis of inde- 
pendence is true, then, in the population, the proportions in any 
one response row are equal to those in every other response row, 
and the proportions in any one age column are equal to the pro- 
portions in every other age column. Therefore, the best estimates 
of the population proportions in any row are the proportions 
indicated by column totals and the best estimates of the propor- 
tions in any column are the proportions indicated by the row 
totals. Either row totals or column totals can be used to obtain 
expected values for a x? test. 

If we use row totals then for any of the age groups the popu- 
lation proportions are as follows: 


Favorable: 400/1464 
No opinion: 110/1464 
Unfavorable: 954/1464 


The expected frequencies for any age group can be computed 
by apportioning all the individuals in the age group according 
to these proportions. Thus for the 20-34 age group the expected 
frequencies are 


Favorable: (400/1464) (565) = 154.4 
No opinions: (110/1464)(565) = 42.4 
Unfavorable: (954/1464) (565) = 368.2 

565.0 


By use of the same proportions the individuals in the remain- 
ing age groups may be apportioned similarly according to response 
pattern. The expected values resulting from these computations 
are displayed in Table 4.4. 
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TABLE 4.4 Expected Frequency of Response to the Question of Spending Tax 
Money for Nursery Schools, on the Hypothesis that Age and Response are Inde- 
pendent. (Data from Table 4.3.) 


Group aged Group aged Group Total 


20-34 35-54 over 54 
Favorable response 154.4 177.3 68.3 400 
No opinion 42.4 48.8 18.8 110 
Unfavorable response 368.2 422.9 162.9 954 
TOTAL 565.0 649.0 250.0 1464 


To write formulas for x? the following notation for the observed 
and expected values will be adopted: 


Observed Frequencies Expected Frequencies 

fu fa fis Fu Fy Fis 
Л fz з Fa Fn Fas 
Ја Ја Л» Fy Ез Езз 


The formula for x? can now be written as 
(fa = Fi MASS Ел)? 
zz ED Ge 
From the values in Tables 4.4 and 4.3, x? can be calculated by 
Formula (4.1). However, a simpler computational routine is 
provided by Formula (4.3). 


fi? 
(4.3) x? = ХХ P. -N 
ij 
If data are in the form of proportions this formula becomes 
(4.4) жа n(zz 5 е: 1) 
ü 
Direct substitution in Formula (4.3) is carried out as follows: 
. (153)? , (35)? (25)? , (160)? A p: 
E tage iat EEE gang d en + 


The computed value of x? = 4.1 measures the discrepancy be- 
tween the set of observed frequencies and the set of theoretical 
frequencies. But is that discrepancy extreme? Is the distribu- 
tion of observed frequencies unusual in view of the hypothesis? 
The number of possible patterns in which the 1464 cases could 
be distributed in 9 classes is enormous. To compute the exact 
distribution and obtain from it the probability of a frequency 
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distribution at least as exceptional as the one observed, as wai 
done for a small sample in the first of this chapter, would be a 
very large task. The smooth x* curve with appropriate number 
of degrees of freedom will provide a close approximation to the 
exact probability. 

The number of degrees of freedom is the number of classes 
less the number of restrictions. When there are r rows and c 
columns the number of classes is rc. There is one restriction for 
each of the r rows, namely 


, Sa tfa FF. + fie Ра + Ра+ + Fie 


This equation means that for any one row, there is a free choice 
of frequency for all cells but one (that is for c — 1 cells) and 
frequency of that cell must be whatever is needed to make 
sum of the f; for the row equal to the sum of the F; for that row, 
Similarly there is one restriction for each of the c columns, 
in that column there is free choice of frequency for all cells but 
one (that is for r — 1 cells). However there are not r + c different 
restrictions but only r + c — 1, for any one of them can be obtai 
from the others. 

The number of classes minus the number of restrictions is 
therefore rc - (т+с—1)=тс—-т—с+ 1 = (т ~ 1)(с— 1). Also 
the number of degrees of freedom is the number of classes for 
which there is a free choice of frequency. In any one row tj 
is such free choice for c — 1 classes. If c — 1 frequencies are fi 
arbitrarily in r — 1 of the rows, the frequencies for all classes 
the remaining row will be foreordained. Therefore the number 
classes for which there is free choice of frequency is 


(4.5) п = (т – 1)(с— 1) 
which is the number of degrees of freedom in problems of 
type. 


which is next larger than the observed value is хл? = 5.4. If 
possible samples of the given size were drawn from a pop 
tion with the given theoretical frequencies, over 25% of tho 
samples would have frequency distributions for which x? w 
greater than the observed x*. This sample is indeed not excep 
tional under the hypothesis. It is appropriate to assume that 
the three age groups are homogeneous with respect to opinion 
on the use of tax money for nursery school. It would not b 
appropriate to seek for a possible explanation as to why the per 
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cent of favorable response was larger (28.1%) for the group aged 
35-54 than for the older group (26,0%) inasmuch as it has been 
shown that chance provides & reasonable explanation for the ob- 
served differences. 

b. Two classes for one variable, more than two for the other. This 
type of problem may also be illustrated with data from Rope's 
Opinion Conflict and School Support. In his sample of 1464 per- 
sons, 707 were men and 757 women. To the question “Some say 
that in future years the only way the schools can keep up the 
services they are giving today is to increase taxes. If this is true, 
should school services be cut or taxes increased?" they expressed 
opinions distributed as in Table 4.5. Аге these data consistent 
with the hypothesis that the per cent holding opinions in each of 
the three categories is unrelated to the sex of respondents? 


TABLE 4,5 Frequency of Observed Response to the Question of Increasing 
Taxes to Support Current School Services, Clossified by Type of Response and 
Sex of Respondent.” 


Men Women Total 

Approve increasing taxes 206 236 ^32 
Have no opinion 183 295 478 
726 454 


Do not approve increasing taxes 228 
Toran 707 757 ИД 
* Data taken from Rope, Opinion Conflict and School Support. 


In this type of problem д? can be computed exactly as it was 
computed in the preceding section, but there is available an equiv- 
alent routine which involves less numerical work, Let a, and b, 
be frequencies in the two groups in the ith class and n, = a, + by. 


lat PO » A, ў» p S; ўм N 


Then А + В = N. There are 2k classes. The number of degrees 
of freedom is n = k = 1, The usual formula for x* can be shown 
to reduce to a special formula, valid only for the case described 
in the heading of this paragraph or described alternately as the 
study of the independence of two traits when one of them ina 


dichotomous trait. 
pese peo 
„© n" N QUU 


N'N NR 


(4.6) 
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The numerical computations for the data of Table 4.5 are shown in 
Table 4.6. The student should verify them by carrying through 
the same procedure using the b; frequencies instead of the а, 
He may also wish to use Formula (4.3) to convince himself that 
the method of Formula (4.6) is algebraically equivalent to the 
general method. 

With an observed x? of 31.3 when х? зо is only 13.8, it is clear 
that men and women cannot be presumed to hold similar opinions 
concerning the issue under discussion. 


Table 4.6 Computation of x? from the Data of Table 4.5. 


a a A | 707 
a; ni a =, N = 16° 48292 
B 757 
296 532 .55639 164.69 N = jp 7 51708 
183 478 .38284 70.06 A B. nl 
228 454 — 5020 мо ‚уу " 
Sum A = 707 N = 1464 34925 = Z— Аз. 
а nu у “5143 
ا 182 ے‎ 7.82 
х= 735 


n= (3 —1)(2—1) =2 


c. Two classes for each variable. From the data described in 
the preceding paragraph there might be made a test of the hy- 
pothesis that the per cent of persons undecided is the same among 
men as among women. The observed per cents holding “по 
opinion" are 25.88% for men and 38.97% for women. The fre- 
quency distribution presented as a double dichotomy is shown in 
Table 4.7. The two computing routines previously described are 
each applicable here and the student may gain practice by apply- 
ing them. However the method about to be described will produce : 
the same outcome with less computation. 


TABLE 4.7 Frequency of Observed Вероне to the Question of Increasing 
Taxes to Support Current School Services, Classified by Indecision and Sex of 
Respondent.* 


Men Women Total 

Holding definite opinion 524 462 986 
Holding no opinion 183 295 478 
Toran 707 757 1464 


* Data taken from Rope, Opinion Conflict and School Support. 
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The four frequencies will be denoted a, b, c, and d, with marginal 
frequencies a + b, c + d, a + c, and b + d, N = a + b+ c+ d. 


a+b 
с+а 
ate +4 N 


The number of degrees of freedom is n = 1. 
The general formula for x? reduces in this case to the special 
formula 


7 (ad — be)*N 
(4.7) X = (ax GED 4- (5 d) 


Here ad — bc = (524)(295) — (462)(183) = 70034 and 


2 _ __ (70034)2(1464)  — 
x (986) (478) (707) (757) — 


whereas for 1 degree of freedom д? 999 is only 10.8. 

The hypothesis of equal per cents of undecided persons in the 
two populations is untenable in face of such data. 

Relation of x? to the Statistic of Formula (3.12). The reader 
will recall that in Chapter 3, page 78, a problem very similar to 
the one of the preceding paragraph was solved by obtaining the 
difference between observed per cents in two independent samples, 
dividing that difference by its estimated standard deviation and 
referring the result to a table of the normal probability curve. It 
may now be stated that the two methods lead to the same result. 

For a large sample if x? has 1 degree of freedom, Vx? has a dis- 
tribution which is the right-hand half of a normal distribution. The 
general proof of the preceding statement involves calculus and so 
will not be presented here. However the algebraic equivalence of 


а; (ad — be)? N > (pi — рз)? 
X^ (аа +004000) 4 * == NTN 


№.№ 


28.5 


is very easily established by sheer algebraic manipulation, using 
either the substitution 
Ni=a+t+e N2=b+d N=N,+4+N, 


20. Wen _a+b 
D № РЕ ү. p= N 
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or the substitution 
Ni=ath Nz=c+d N-N-c-N 
a c a+c 
=, Та = г. p= N 


The equivalence of the two methods for the data of Table 47 
may be seen by computation as follows: 


Let p= тўб =.6735 and q= 1- p = ү = .3265 
pU — p)(Nı + МӘ) _ (.3265)(.6735) 1464) 


XN. QD — 0000015 
pi = 88$ = 7412 

р: = 385 = .6103 

> _ (2588 — 3897) _ .0171348 _ 2g 49 


~~ 0006015 7 0006015 
which agrees with the value previously given for x’. 

Inasmuch as the normal distribution is tabulated for finer 
intervals of the argument than the chi-square distribution, it is 
& satisfactory procedure to compute x? by Formula (4.7), to 
obtain z by taking the square root of x?, and to refer the result 
to a table of normal probability. If the logic of the problem seems 
to relate to the difference of two per cents, the outcome may be 
discussed as a test of the hypothesis P; = Pa. If the logic seems 
to relate to the independence of two variables the outcome may 
be discussed in those terms. 

Comparison of Two Proportions Based on the Same Indi- 
viduals. An interesting type of problem arises when the same 
individuals are measured on two different occasions and a com- 
parison of the per cents registering a given characteristic on the 
two occasions is required. Suppose, for example, that in an 
elementary school a test of racial prejudice is administered to 
200 pupils of whom'78 return highly prejudiced answers. A movie 
designed to reduce prejudice is shown them a week or so later, 
and after another day the same test is readministered, this time 
with only 52 pupils showing highly prejudiced answers. None of 
the methods previously described in this chapter can be used, 
for in all of them the per cents to be compared were based on 
different individuals. 

Let b--d- number of children showing prejudice on first test 
€ +d = number of children showing prejudice on second test 
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Then b = number showing prejudice on first but not on second 
and с = number showing prejudice on second but not on first 


We are interested in the hypothesis that in the population the 
proportion showing prejudice only on the first test is equal to the 
proportion showing it only on the second. This hypothesis is 
tested by computing 
(4.8) х= Чы ace буе with 1 degree of freedom 
It is therefore necessary to have not only the two figures b + d = 78 
and c + d = 52 but also to have at least one of the four cell entries. 
Suppose we know that 30 pupils showed prejudice on both tests, 
so d = 30. 

Then b = 78 — 30 = 48 

с = 52 – 30 = 22 


2= М = V9.66 = 3.1 


The reduction from p; = эр = -39 to р» = 575 = .26 is highly sig- 
nificant and cannot be attributed to chance. 

The Two-by-two Contingency Table in Very Small Samples. 
That a smooth curve does not provide a very close approximation 
to exact probabilities when samples are quite small has been noted 
in Chapter 3 and in the early part of this chapter. In Chapter 3 
the normal curve was seen to provide a good approximation to 
the binomial for large values of N but not for small. In this 
chapter the smooth chi-square curve was seen to provide a good 
approximation to the exact x? probabilities for large N and not 
for small. This situation raises the question as to how to deal 
with problems similar to those discussed in the preceding section 
when the samples are quite small. The question of what is meant 
by “quite small” will be disregarded for the moment but dis- 
cussed later on. 

In his study of Adolescent Fantasy," Symonds asked adoles- 
cents to write a story about each of a set of pictures he presented 
to them. Each story was then read and categorized as using or 
not using а particular theme such as violence, sex, bad compan- 
ions, illness, ete. One of the questions in which he was interested 
was the possible discovery of a sex difference with respect to the 
use of a particular theme. Each child was classified as using the 


104 - Chi-square 


theme if it occurred in at least two of his stories and as not using - 
it if it occurred in one or none of his stories. 

Symonds used the same number of boys and girls. In order 
that the student may see the method in a more general form with- 
out the special condition introduced by keeping the size of the 
two samples equal, the original data will be changed slightly. 
Suppose that 16 boys and 18 girls have been included in such an 
experiment, and that of the boys 7 have used the theme of violence 
while of the girls only 1 has used it. The corresponding observed 
proportions are p, = їс = .4375 and p, = py = .0556. Is the dif- 
ference between these proportions compatible with the hypothesis 
that in the population the proportions are equal, P, = P, = P? 

It is convenient to present the observed results in the form of 
a contingency table, as follows: 


Boys Girls Total 
Using the theme of violence 7 


Not using the theme of violence 9 
TOTAL 16 18 34 


We shall now consider the hypothesis that sex is unrelated to 
the tendency to use a theme of violence and shall assume that all 
samples have the same marginal totals: 


Boys Girls 
Using theme ‘ a b 8 
Not using theme 6 | а | 26 
16 18 34 


Under this restriction there are 9 sets of possible values which 
can be taken by the cell frequencies a, b, c and d, and these are 
listed as arrangements A to J in Table 4.8. The probability of 
any one arrangement is given by formula 


(a + b) Kc + d)Y(a + c)Y(b +d)! 
N!a!b!e!d! 


The probability for each of the 9 possible arrangements has 
been worked out in Table 4.8. 

The probability distribution of Table 4.8 might be represented 
graphically in the manner of Figures 3-1 and 3-2. Now suppose 
-05 has been selected as the level of significance and the critical 


(4.9) 
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TABLE 4.8 All Possible Arrangements of Cell Frequencies in a Two-by-Two 
Table with Fixed Marginal Frequencies, and the Probability Associated with Each 
Arrangement. 


Ао Probability 
R ee ПИТТИ” ТЫВ" 01134 
ute ! OE 0848 
2 Us eS 198016 19631 
о d й аша. = пню = 30614 
ғ 12115 ЖП qp 2987 
TE „шшщ. шш 
‚Жз с ына к Шш 
і TIEU BITISTIOTSTUI " 1000875 ~ (00261 
TOTAL ы = 1.00000 


region is to be so chosen that it will subtend .025 of the area in 
each tail. Then it is clear that arrangements A and B with proba- 
bilities .00071 + .01134 = .01205 fall in the critical region at one 
end of the distribution and that arrangement J falls into the 
critical region at the other. Any one of these three arrangements 
would be held incompatible with the hypothesis; none of the 
other six would be considered as throwing suspicion on that hy- 
pothesis. Since the observed data have shown arrangement B, 
the data present evidence of a sex difference with boys tending to 
use the theme of violence oftener than girls. 

Because of the arithmetic involved the method just described 
can be recommended only for very small samples. Certain aids 
are available. A table of the logarithms of factorials is to be 
found in Statistical Tables * by Fisher and Yates. 

In small samples the usual computation of x? gives too large 
a value, leading to rejection of the hypothesis more often than 
would the direct computation of probability by factorials. This 
error can be offset by a procedure commonly known as Yates’ 
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correction * * 18, The procedure is to change- the frequency in 
each cell by .5, keeping the marginal totals unchanged, and re- 


ducing the size of x°. Thus observed frequencies 5 | with 


x? = 4.4 would be changed to 55 | 95 with x? = 2.8. The same 


effect can be produced more easily without rewriting the frequen- 
cies by subtracting 4N from the absolute value of ad — be. Thus 
in the illustration N = 24 and ad — bc = 10 — 70 = — 60, and 
| ad — bc | — $N = 60 — 12 = 48. The vertical bars around a num- 
ber indicates its numerical value without regard to sign. 

If x, represents a value of x? adjusted by Yates’ correction, 
then 

2 (| ad - be | - №/2) У 

с) X 7 (a+ a+ + 4)(с +d) 
Verify that applying Formula (4.10) to the frequencies on the left 
gives the same result as applying the usual x? formula to the 
adjusted frequencies on the right. 


10| 4 9.5 | 45 
2.5 | 3.5 


SR KT ГЕ: 

b 5 4 5.5 | 3.5 
* 15 2 14.5 2.5 

" 20| 6 19.5 | 6.5 

Tb 15 5.5 | 14.5 
When the expected frequency in every cell is large, applica- 
tion of Yates’ correction will have a negligible effect on x. The 
following working rule is suggested: (1) If the significance level 
obtained from x? is larger than the predetermined significance 
level, the hypothesis may be retained. (2) If the significance 
level obtained from x? is smaller than the predetermined signif- 
icance level, the “corrected” x,? should be computed. If the 
probability obtained from it is also smaller than the significance 
level, the hypothesis may be rejected. (3) If decisions based on 
x’ and on “corrected” x,” are in disagreement, the exact proba- 
bility should be calculated by the method described on page 103. 
Use of the Chi-square Approximation when Expectations Are 
Small. Use of the table of areas under the chi-square curve as an 


approximation to the discrete chi-square distribution is justified 
mathematically when the number of cases in the sample is large. 


а 
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The research worker, who needs some exact definition of the word 
“large” in this situation, can find no simple answer to his ques- 
tion 2 * 7 1, Тһе discussion in this section is directed toward 
finding practical rules that are reasonable and clear. 

The word “large” relates to the number of cases expected (not 
the number observed) in a cell. Theory requires that all expecta- 
tions shall be “large.” It is commonly stated that the chi-square 
curve provides an adequate approximation when the least expec- 
tation in any cell is five. This requirement seems excessively high. 

Figure 4-3, which is based on samples of only 12 cases in three 
cells, the expected value per cell under the hypothesis tested being 
only 4, shows fairly close approximation of the values computed 
by the chi-square curve to the exact chi-square probabilities. 
Similar results were obtained by Neyman and Pearson’ on the 
basis of ten cases in three cells. Cochran * discusses the situation 
when the expected number of cases is small for one of the cells 
and large for all the remaining cells and the number of degrees 
of freedom is two or more. In this situation the x? table will 
provide a good fit even if the expectation in the one cell is as small 
as one. 

The following practical rules of thumb are suggested for test- 
ing significance by use of the tables of areas under the chi-square 
curve. 

1. If there is only 1 degree of freedom, follow the suggestion 
previously given for the use of Yates’ correction. 

2. If there are 2 or more degrees of freedom and the expecta- 
tion in each cell is more than 5, the chi-square table assures a good 
approximation to the exact probabilities. 

3. If there are 2 or more degrees of freedom and roughly 
approximate probabilities are acceptable for the test of signif- 
icance, an expectation of only 2 in a cell is sufficient. 

4, If there are more than 2 degrees of freedom and the ex- 
pectation in all the cells but one is 5 or more, then an expectation 
of only one in the remaining cell is sufficient to provide a fair 
approximation to the exact probabilities. 

5. If the logic of the problem permits, combine some of the 
classes to increase the expectations in the cells when several cells 
have very small expectations. 

Summary. Three new ideas are presented in this chapter, 
(1) the idea of x? as a measure of discrepancy between a set of 
observed frequencies and the corresponding frequencies expected 
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under some hypothesis, (2) the idea of a contingency table, and 
(3) the idea of degrees of freedom. Several methods of computing 
the statistic x? are presented and it is important to understand 
the circumstances under which each may be used. The principal 
skill developed is the ability to use a table of the x? distribution. 
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5 Populations and Samples on a 
Continuous Variable 


The methods considered in the first four chapters are 
suitable for obtaining inferences from data classified in discrete 
classes. In dealing with data arising from measurement of a 
continuous variable these methods are insufficient, although the 
logic used is in its general outlines the same as that already de- 
scribed. This chapter will present some general concepts regarding 
distributions on a continuous variable. It will present symbolism, 
formulas for obtaining the mean and standard deviation of a 
sample, and methods for testing the hypothesis that a popula- 
tion distribution is normal. A student who has had an introduc- 
tory course should be able to read pages 111 to 117 very rapidly; 
others will need practice with symbolism and computation. 

Mathematical Model of the Population, The first populations 
considered in this book were dichotomies, For these the mathe- 
matical model consisted of a definition of the two discrete classes 
and a statement that the proportion of cases in one class was P 
and in the other Q. Later P and Q were called the probabilities 
of these classes and the population distribution thus specified was 
recognized as a probability distribution. Subsequently we con- 
sidered populations having more than two discrete classes, the 
mathematical model for such being a definition of the classes and 
a statement of the probability of each class. For populations on 
a continuous variable, probabilities are defined by areas under a 
curve. There are many forms of probability curves of which 
the most widely known is the normal curve. Practice in computing 
probabilities as segments of area between two ordinates of the 
normal curve has already been given in Chapter 2. 

The Mean and Variance of a Population. A population may 
be considered to have a variety of characteristics, among which 
are the mean and the standard deviation. If X represents the 
variable, these have already been defined as 


p= Е(Х) and o = VEX = н)! 
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If the probability distribution of the population is known e 
и and o can be calculated by means of the integral caleulus, the 
procedure being analogous to that used in Formulas (2.4) and. 
(2.8) on pages 28 and 29 for discrete distributions. If the proba- 
bility distribution is not known exactly but some information 
about its nature is available, the mean and variance or stand: 
deviation can be estimated from a sample. Thus for probabi 
distributions known to have the binomial form, the estimation 
of the mean was discussed in Chapter 3. Methods of estimating 
characteristics of continuous populations will be discussed in later. 
sections. 1 
Normal Populations. The normal distribution was described 
in Chapter 2 as an approximation to the binomial distributio ny 
and was used in Chapter 3 to compute probabilities of samples. 
An important use of the normal distribution is to provide a mathe- 
matical description for populations, A population so described 
called a normal population. Most of the methods discussed in 
following chapters, with the exception of those in Chapter 1$ 
are based on the assumption that observations are drawn from 
normal population. Some aspects of the normal distribution п ot, 
previously considered will now be described.* 1 1 
The normal probability curve is a smooth distribution with 
unlimited range in each direction and with probability density - 
given by the equation 


1 
- e 

(5.1) y A 

y is the ordinate of the curve, called the probability density. 

X is the variable for which (5.1) furnishes the probability distri- 
bution, and is distributed along the horizontal axis in all the 
graphs of the normal curve presented in this text. 

т is the familiar number with approximate value 3.1416, 

c is a number with approximate value 2.718. i| 

u = E(X) is the expected value of X. Being the mean of the 
normal distribution, it determines the location of the distribu- 
tion along the scale of X. It may be positive, negative or 
zero. The mean, median and mode of the normal curve аге. 
identical in value, | 


* Readers who are not mathematicians and who desire further discussion of the 
equation of the normal curve than is given here may consult pages 245-248 of Walker, 
Mathematics Essential for Elementary Statistics, 
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о? = E(X ~ pu)" and may have any positive value, The standard 
deviation g of X ія the unit of measure for the horizontal axis. 
The area under the curve of Formula (5.1) is unity. 

The ordinate y depends not only on the particular value of 
X which defines the point where the ordinate is erected but also 
depends on the values of p and е, When и and е are given par- 
ticular values, an ordinate y can be caleulated for each X and а 
curve represented by these ordinates can be drawn. For another 
pair of values of u and g another curve is obtained, Thus Formula 
(5.1) represents a family of curves. Variables like и and e, specific 
values of which determine a particular member of a family of 
curves, are called parameters, 

The symbols e and ж represent numbers which always have 
the same value. The symbols и and е represent numbers which 
are constant for any one curve but vary from one curve to another. 
The symbols y and X represent. variables which fluctuate even 
for one particular curve, 

In Figure 5-1 are three normal curves corresponding to three 
pairs of values of и and e, drawn on the same scale of values of X. 


er = не 
enh ents 0-5 
Vio. 5-1. Three normal curves with same ares but 
different values of p and e. 

Strictly speaking а sample cannot have a normal distribution. 
As soon as actual values are observed, those values have some 
finite range instead of the unlimited range of the normal eurve; 
the values if plotted take the form of a histogram instead of а 
smooth eurve; and sampling irregularities distort the form of the 
curve slightly. Therefore the normal curve ія best considered 
ая a probability distribution describing either a population din- 
tribution or the sampling distribution of a statistic but not describ- 
ing an observed distribution of a sample. 

Use of Summation Sign. One of the most widely used symbola 
in statistical literature is the large Greek letter sigma, X, corre- 
sponding to a capital S. It ія called the summation sign. Before 
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giving the formulas for the mean and the standard deviation of а 
sample, the use of the summation sign may well be reviewed. 

If X, represents the score of the first individual, X; the spore 
of the ith, and N the entire number of individuals in a sample, 


N 
Xit Xr+ t ХХ; +++ Ху = Ух, 


=1 
N 
У, X; may be read “the sum of the scores of individuals 1 to N,” or 
=1 


“the sum of values X; where ї ranges from 1 to №” or summation 
X extending from X; to Xy." . 

The purpose of the variable subscript and the limits of sum- 
mation is clarity. They may be omitted if they are not needed 
to prevent confusion. In élementary work summation is almost 
always over the N individuals in a sample and therefore because 
the limits of summation are assumed to be from 1 to N they are 
usually not written. In later sections of this book beginning with 
Chapter 9 observations may be classified on several traits simul- 
taneously and formulas may be ambiguous unless the limits of 
summation are indicated. 

The formulas for the mean and standard deviation of a sample 
when the observations are classified in step intervals may be clari- 
fied by means of subscripts. For example, suppose 20 individuals 
have the following scores: 


2, 4, 9, 6, 3, 5, 6, 6, 3, 3, 7, 5, 7, 4, 3, 8, 8, 6, 5, 4 
The sum of these scores may be indicated as 


20 
ух, = 104 
Ке 
ог аз 
Ух, = 1(9) + 2(8) + 2(7) + 4(6) +3(5) +3(4) + 4(3) + 1(2) = 104 


When the mean of these 20 scores is computed in the customary 
manner for grouped data, each class index X; is multiplied by the 
frequency in the class f; and the results are summed, Zf;X;. But 
here the variable subscript 7 does not run from 1 to N but takes 
only as many values as there are class intervals, which in this 
example is 8. Therefore we write either 


20 8 
XK: = 104 or УХ, = 104 
-1 j=l 


| 


| 
! 
| 
\ 
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Note carefully the effect of parentheses in the following: 


A &:+ a= (Xa + a) + (Х, + a) + (X5 + û) 
= Xı+ Х,+ X< + За 


5 
УХЕ ЕО 


ind 
2x: + Yi) = (X:+ Y) + (Kat Y) + (Xs + Y) 


5 5 
= (Xs+ Xa+ XD + (Ya + Ya +Y) = УХ, + ЎҮ, 
i=3 m3 


EXERCISE 5.1 
1. Study the summations represented by the following symbols and 
their equivalents: 
6 
ух, = X:+ X+ Xs+ Xs 
A 
» ах; = алы + ах + ахи = a(zi + zu + 2ш) 
i=10 
10 
Y (Ki — 5) = (Xs — 5) + (Xr — 5) + (Xe — 5) + (X — 5) + (Xu — 5) 
ize 
= Xe + XXX Хь— 25 


8 
У Зуё = Зу + Зу? + Зуё = Bly + yf + ye?) 
i= 


8 
Y 2; — b) = 2(Z, — b) + 2(Ze — b) + 2(2, — b) + 2(Zs — b) 
=5 
= 2(Zs + Zs + 2: + Zs) — 8b 
9 
У тан = ту + Tas + Tayo 


=7 


4 
You = АХ + +X + fe 


2. Write an equivalent expression for each of the following, making 
use of the summation sign: 

а. Хх. + Xo Xs+ Хә 

b. 425 + 42% + 427 + 425 

с. 3az«ye + 327: + Szays 

d. (Ys — A) + (Ys — A) + (Y: — A) + (Ys – A) 

e. ЛХ + faXa + fsXs + fiXa + fsXs + fes 

3. Assume the following individual scores: 

Individual Lo 2 SO Anc EO SCHOLL 42 

Score X ОБ 35,46) $6 MDE a 
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Find the numerical value of the following оңа: 


Niel 7. 


12 4 Mab 
b. X=% yx f. XQ — 2)? ($ у 
i= i= M 
12 э у саан 
© 10-3) Е. (X x ) RENE 


12 12 
4. Y (X; - Xy h. 45 3x) 


=1 
4. Assume that each of N = 10 individuals has been measured on 
variable X and also on variable Y with results as follows: 
Individual 1 2 3 4 5 6 7 8 9 10 
Score on X 7 9 + 3 6 10 7 8 2 4 
Score on Y 2 1 0 1 3 4 5 0 3 1 


Find the numerical value of the following expressions: 


a Y=5 SEO Ya " چ‎ 
N i=l i=l 
ae coe x, Sox SOY) h Ў ‚„_ (ХИ)? 
агаа * A ў ч Ў AT UN 
e" N N 
ex: f. YQu-Xy i. EQ. Yy 
=1 i=l i=l 


Definition of Mean, Variance, and Standard Deviation for a 
Sample. In any sample there is likely to be duplication of scores. 
Even when no two observations are precisely equal, it is usually 
convenient to group observations into intervals at least some of 
which have frequencies larger than 1. Both of these situations 
will be referred to as situations in which data are grouped. When 
working at a computing machine it is often convenient to take 
scores one at a time as they come, without-grouping them. Such 
situations will be referred to as computation without grouping. 

Definitional formulas for the mean and variance will now be 
given for both grouped and ungrouped data. Computational 
formulas will be given in the following section. The standard 
deviation is defined as the square root of the variance and does 
not require a separate formula. 


(5.2) Mean, without grouping X = 
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k 
iu 
(5.3) Mean, with grouping X Nae - NX 
m 
J 


In Formula (5.3) each of the k different values of X; may be 
thought of as defining a class. Then f;/N is the proportional 
frequency in that class and may be denoted p; for the sample or 
P; for the population. Then 


k 
(5.4) X = Y»X; 
ј=1 

(5.5) Let a= X;-X or з= Х;- Х 
Then the variance without grouping is 

X,- Xy 
(5.6) a Bem . IE 

N-1 N-1 


and the variance with grouping is 
k 
n = 2 
2505 X) Е Уул? 
E N-1 
2a 


ј=1 


(5.7) 8? = 


Some readers will be accustomed to using № rather than № — 1 
in the denominator of variance and standard deviation. The 
reason for using N — 1 is that it leads to a more satisfactory 
estimate of the population variance as will be explained in later 
sections of this chapter. N — 1 is the number of degrees of freedom 
associated with the variance. 

Formulas for the Computation of the Mean, Variance, and 
Standard Deviation for a Sample. In computation it would 
usually be an uneconomical procedure to subtract the mean from 
cach individual score. Instead deviations are taken from some 
arbitrarily chosen value as origin and the computation is com- 
pleted by a formula which is algebraically equivalent to the 
definition formula. 

Taking the origin at zero is particularly convenient when com- 
putation is made by machine. Deviations from zero are called 
"gross scores" or “raw scores.” The variance of ungrouped 
scores with origin at zero may be computed from Formula (5.8) 
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xe (3 x )u 
e ET Ула ZX:-(ZX)/N 
(5.8) ‘aes ps c DoD a Se А E OT EE 


or by the equivalent Formula (5.82) which has the advantage thi 
no rounding errors occur until the final division by N(N — 1). 


N N 2 
(5.8a) m Li wid 7 Дх, _ МУХ? – (2X)? 
: N(N = 1) NN-1) 


For grouped scores with origin at zero the equivalent formulas are 


k 5, 2 
, ior (am) aw _ 2X? — (2/X)!/N 


ns Ы N-I N-1 
мўл (Ўл) 
МИЕ С EN ERES — (уху 
(69а) а= — чи / NN- D) EL s 


Taking deviations from an arbitrary origin near the middl 
of the distribution reduces the size of the numbers involved й 
the computations, and is therefore an advantage when the work 
is to be done with pencil and paper. Grouping scores into im 
tervals and coding the intervals further reduces the size of th 
numbers involved. The width of the interval is often ed | 
and will be so denoted here. It will require a little attentio 
on the part of the student to distinguish the variable subserip 
i from the letter i which indicates the width of the step inte 
As has been noted before, there are not enough letters in the | 1 
phabet to provide a unique letter for every use and it is oft 
necessary to use the same letter in more than one meaning. 
any confusion arises it will quickly subside. 


(5.10) Let izj = X;-A 


where 7 is the width of the interval and A is an arbitrary origin. 
When deviations are taken from an arbitrary origin A the met 
is computed by formula 


(ҝл) (Уух 
(5.11) Y-asi - a if) 


and the variance by 
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(5.19) = E i 2) E OT EY 


In computing the standard deviation it is obviously simpler to 
multiply by ¢ affer taking the square root unless the variance 
also is needed, 

The computation of the mean and variance from grouped data 
is shown in Table 5.1. 


TABLE 51 Computation of Mean ond Standard Deviation from Grouped 
. Dota. 


Interval X, ГА Li Sai ег» 
FY 2 3 4 5 6 
78-82 80 3 4 12 m 
73-71 75 6 3 18 м 
68-72 70 7 2 м 2 
63-07 65 12 1 2 12 
58-02 00 15 0 0 0 
53-57 55 13 -1 - 13 13 
48-62 ` 50 9 -2 -18 36 
43-47 45 7 -3 =3 bol 
38-42 40 4 -4 -10 6 
33-37 35 2 -5 - н + 
28-32 30 1 =e E 
Toran % w 
Y = م‎ + 5628) = 00 = 177 = taam 
“= s 007 у „ дй = 11.25 
“= 25(5.0523) = 1203 
EXERCISE 5,2 


1. Compute the mean and variance of the following acores, taking an 
arbitrary origin at A = 40. Scores: 42, 37, 49, 45, 42 


Solution 
x xX- = 40) 
42 v » 1-0+0- 4 
37 -3 
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2. Compute the mean and standard deviation of the following Scores 
without grouping, taking an arbitrary origin at A = 80. 
Scores: 74, 79, 87, 91, 76, 79, 85, 81, 80. 

3. Compute the mean and variance for each of these sets of numbers 

a. 15, 10, 19, 6,5 

b. 2; 4, —3, —7, 6, 7 

с. 0,3, 2, 0, 5, 7, 1 

4. For the data of the worked example on page 117 compute the mean 
and standard deviation making use of a different arbitrary origin. 


Sampling Distributions. The sampling distributions of the 
mean X, of the variance s*, of the standard deviation s, and of 
other statisties will be needed in this and later chapters for the 
purpose of making inferences about populations. Because a large 
number of new statistics will be used, certain concepts regarding 
sampling distributions already introduced in Chapter 3 will be 
described in the following sections. 

The population from which a sample is drawn is often called 
the parent population. Theoretically if the parent population has 
heen described mathematically by some hypothesis the exact 
sampling distribution of any statistic can be derived by mathe- 
matical methods. Actually the derivation of sampling distribu- 
tions often presents considerable mathematical difficulty, so that 
the exact sampling distribution of a particular statistic may not 
be available for practical use even though the population dis- 
tribution can be mathematically described. 

When the exact sampling distribution is not known, or is too 
complex for practical use, an approximate distribution may be 
available. Examples of such approximation have already been 
met when the normal eürve wa$'used as an approximation for the 
binomial, and the smooth x? curve was-used to approximate the 
discontinuous exact x* distribution. Such approximations will 
also be found useful for ге шш um populations on а con- 
tintious variate.” t 

The standard deviation of "he sinon distribution of a sta- 
tistic is commonly called the standard error of the statistic. "Thus 
c; is the standard error of a proportion and ey the standard error 
of a mean. “In Chapter 2° ‘the mean ‚апа standard error of a pro- 
portion. were e;giv en.as.. 


6.81 


It: 
os 


y= Еф) = P and o, = VEG PF + 79 
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In Chapter 6 the mean and standard error of a mean will be given 
as 


by = E(X) = п and оу = VE(X- م‎ = 


These cases illustrate the important generalization that the mean 
and standard error of a statistic can be calculated directly from char- 
acteristics of the population and may be known even though the exact 
sampling distribution of the statistic is not known. Early work in 
statistical theory placed great emphasis on the discovery of means 
and standard errors of statistics, and some textbooks still state 
formulas for standard errors without giving any information about 
the sampling distribution. Usually in such cases it is tacitly 
assumed that the sampling distribution is normal. Any text 
which gives the formula for or or e; without any statement about 
the non-normality of the sampling distributions involved is living 
in the past. One distinctive aspect of modern statistical theory 
is its persistent search for the exact sampling distributions of 
statisties and satisfactory approximations to these distributions. 

Unbiased Estimate of a Parameter. Suppose that it is de- 
sired to estimate some population parameter by means of a sample 
statistic. If the parameter is the expectation of the statistic, then 
the statistic is an unbiased estimate of the parameter. Thus p 
is an unbiased estimate of P and X an unbiased estimate of и. 

It is now possible to explain why N — 1 is used instead of N 
in the definition of the sample variance, s*. It can be demonstrated 


mathematically that д is an unbiased estimate of c?, or 


in symbols 
z(X-X» 
(5.13) (26-0) — 


ss XX) d у i 


(5.14) while 


Therefore Formula (5.14) gives a biased estimate of o. 

Test of Normality Applied to a Distribution of Sample Vari- 
ances. The data for this computation were obtained by students 
in a class in statistics studying sampling. Each student made 
16 throws of ten pennies thus obtaining samples for which N = 16, 
and computed the variance of each sample. The distribution of 
the variances from 256 such samples is shown graphically in Fig- 
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ure 5-2, with a superimposed normal curve having the same mean 
and standard deviation. The histogram appears a little more 
peaked than the normal curve. А test of normality employing x? 
will now be applied to these data. 


o оо o 
ee „8 
a Vu uno o o 


Fig, 5-2. Frequency distribution of 256 sample variances, 
with normal eurve superimposed. 


TABLE 5.2 Distribution of 256 Sample Variances 


Variance Í Variance Í Variance f 
5.3-5.4 1 3.7-3.8 3 2.1-2.2 33 
5.1-5.2 1 3.5-3.6 8 1.9-2.0 22 
4.9-5.0 1 . 8.3-3.4 7 1.7-1.8 19 
4.7-4.8 os 3.1-3.2 15 1.5-1.6 19 
4.5-4.6 2 2.9-3.0 16 1.3-1.4 18 
43-44 2 27-28 21 1.1-1.2 7 
4.1-4.2 3 2.5-2.6 23 .9-1.0 3 
3.9-4.0 5 2.3-2.4 25 4- 8 2 


l. The first step is to obtain the values of the 10th, 20th, 
30th . . . 90th percentiles of a normal curve having 


„=X = 2.388 and с=з = .816. 


The values of us E corresponding to these points are read from 


a normal probability table and recorded in column 2 of Table 5.3. 
Each entry in that column is multiplied by с = .816 to obtain 
the entries in column 3. To these are then added u = 2.388 to 
obtain the entries in column 4, which are the abscissa values of 
the percentiles 8° 1, 5*5», etc., on the baseline in Figure 5-2. 
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TABLE 5.3 Computation of x* Test for Normality of Distribution of 


256 Sample Variances 
Per- E » Frequency ^ 
centile X-u X het in 
ато below X interval 
fi Ms 
i 2 3 4 5 6 7 

90 1282 1046 3494 223+ EA (7) =2204 266 707.56 
80 842 .687 3075 208+ s (15) = 209.9 195 380.25 
70 см — 498 2816 171+ E (21) = 1883 216 466.56 
60 258 206 2.594 148 + 3 (23) = 1646 237 561.60 
50 0 0 2388 103+ SE 05) = 1078 308 1354.24 
40 — —.253 —.206 2182 90+ B. (88) = 118 160 25600 
30 —.5 – 408 1900 68+210(22) = 801 317 — 100480 
20 — —.842 – 687 1701 4+ у (19) = 538 263 691.69 
10 —1282 — 1.046 1342 12+ ЕТ (18) = 203 385 112225 
208 412.09 

2560 605722 

хуа 6957.22 
xt = Neg = 256 = 158 n=7 
xis = 14.1 Xs 16.0 


2. The second step is to compute the frequency in the histo- 
gram between each two of the successive points listed in Column 4. 
For this, the frequency is cumulated beginning at the lower end 
of the scale of scores, as in the computation of percentiles and 
of percentile ranks. The frequency below each point is then 
obtained as in column 5 of Table 5.3. 

It may be helpful to explain in detail the computation of one of 


the entries in Column 5, say the 
first one. The frequency below ECEN 
the point X = 3.434 is desired. 


From Table 5.2 it is evident 325 3.43 3.45 
that the point X = 3.434 lies in the interval marked 3.3-3.4, that 
there are 7 cases in this interval and 223 cases below it. There- 
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fore below the point X = 3.434 there will be all of the 223 cases 
plus part of the 7 cases. What part of the 7 cases? The interval 
has as its real limits the values 3.25 and 3.45, so X = 3.434 stands’ 
nearly at the top of the interval. The proportion of the interval 
below this point is 


3.434 — 3.25 _ .184 

3.45 — 3.25 ` .200 
and therefore the number of cases in the interval below X = 3.434 
is чаб 00 (7) = 6.44 and the entire number of cases below X = 3.434 
is E + 6.4 = 229.4. 

The first entry in Column 6 is 256 — 229.4 = 26.6 or the num- 
ber of cases above X = 3.434. The second is 229.4 — 209.9 = 19.5, 
or the number of cases between X = 3.434 and X = 3.075. All 
subsequent entries in this column are found in the same way. 
The last entry is 20.3 — 0 or the number of cases below X = 1.342. 
The entries in Column 6 are treated as observed frequencies in 
the x? formula. 

3. Since each F; = N/10 = 25.6, the remainder of the computa- 
tion is very simple. 


15.8 = 256 - 22 عت 


xt rani = N= sh 2/2 — 256 = 


There are 10 observations (10 values of f, У 2 3 restrictions upon 

them, namely 

(1) Ху, = ХЕ; The number of cases іп the 
fitted normal curve is assumed 
to be 256, which was the ob- 
served N. 

(2) >/;Х‹= EF;Xi This condition was imposed 
when the theoretical distribu- 
tion was given the same mean 
as the observed distribution. 
The X’s are arbitrary, the re- 
striction being upon the fre- 
quencies. 

(3) Zf(X;—- X) -ZF(X;— X)? This condition was imposed 

j when the theoretical distribu- 
tion was given the same 
variance as the observed dis- 
tribution. 


Review Exercise - 123 


The number of degrees of freedom is therefore n = 10 — 3 = 7. 
For 7 degrees of freedom, x^s; = 14.1 and x? = 16.6. This test 
justifies discarding the hypothesis of normality. 

In this problem, u and с could have been obtained from the 
mathematical characteristics of the sampling distribution of s?. 
However, in most similar situations where a x? test is employed 
there are no a priori values for и and сапа these must be estimated 
from X and s. The latter method has been used here to illustrate 
what is the usual procedure. 


REVIEW EXERCISE 
1. Some of the following terms have been used in earlier chapters, and 
all of them are important for understanding this chapter. 


degrees of freedom parent population 

family of curves sampling distribution of a statistic 
limits of summation standard error 

normal population statistic 

normal probability curve unbiased estimate 

parameter variable subscript 
2. Review the meaning of these symbols, and decide which of them 

represent 


(a) a number which always has the same value 

(b) a number which has the same value for any given population but 
which may have different values for different populations 

(c) a number which changes its value from sample to sample 

(d) a symbol which indicates an operation to be performed 


X, M, 7, 8, Oy Rey 06; X-u, oy, and px. 
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Purposes of the Chapter.* The principal goal of this 
chapter is to give the student a sense of familiarity with the 
different sampling distributions which are most often called for 
in research studies, to acquaint him with the general shape of 
these distributions and with the more common situations to which 
they apply. To this end empirical distributions will be obtained 
by drawing many small samples, computing statistics from these 
samples, and plotting the frequency distributions of the various 
statistics. On the empirical distribution of each statistic there 
will be superimposed a mathematical curve obtained from the 
formula for the sampling distribution of that statistic. 

The chapter has been designed to accomplish certain second- 
ary goals also. It provides experience in the use of a table of 

andom numbers. It provides a review of the principal computa- 
tions of elementary statistics, that review occurring incidentally 
in the course of obtaining the empirical data needed rather than 
occurring for its own sake. It provides an introduction to the 
material treated more fully in Chapters 7 to 9. 

Here too the student may observe an instance of that amazing 
parallelism so often noted between certain aspects of the world 
of phenomena and the world of abstract mathematical theory. 
One performs certain physical operations, draws a large number 
of samples, measures the individuals in those samples, makes cer- 
tain computations upon those measurements, draws a graph of 
the resulting statistics and lo, the form of the graph is similar 
to that of a curve determined by purely mathematical reasoning 
and having no implicit connection with the concrete data. Further- 
more, the shape of the graph obtained through this extensive 
manipulation of concrete measures is independent of the nature 
of the trait measured. That shape is the same whether measure- 
ments are made of human beings, manufactured produets or shell 

* "This chapter will be most helpful if you read it rapidly the first time without 
pressing too hard for complete understanding, and if you return to it often as you study 


the next three chapters. It would be wise to reread it completely after you have studied 
Chapter 9. 
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fish, whether the measures belong in the domain of physics, 
psychology, business, or publie opinion. The shape of the graph 
depends only on certain characteristics of the population distri- 
bution, the size of the samples drawn, and the choice of the 
statistic computed. 

The student who has completed the caleulation of a statistic 
usually wishes to use that statistic as basis for assertions about an 
unknown value of a parameter in the population. Of necessity 
he must check the statistic against a table derived from the mathe- 
matically calculated sampling distribution of the statistic. The 
observed parallelism between the empirical and the mathematical 
distributions should convince the student that the mathematical 
tables are a reflection of experiential reality. Then if the mathe- 
matical distribution indicates that the research worker should 
accept a hypothesis because his sample may have originated from 
the hypothetical population by chance, repeated sampling, which 
usually he would be unable to carry out, would have given him 
the same information. 

The student who goes carefully through the material of this 
chapter should never afterward fall prey to the popular fallacy 
that all sampling distributions are normal and he should know 
that the beneficent parallelism referred to in the preceding рага- 
graph is not discovered automatically but must be striven for by 
learning which probability distribution belongs to any particular 
statistic. 

The Data. In Table XXIV of the Appendix is a set of pre- 
registration scores on the Cooperative Test Service English Test 
made by 447 college students. The distribution of scores has been 
tested for normality as described in Chapter 5 and the departure 
from normality has been found negligible. 

Because of the close approximation of its distribution to the 
normal, the set of 447 scores will play the part of a normal popula- 
tion in the remainder of this chapter. Random samples drawn 
from this set of scores will be considered as random samples from 
a normal population and will illustrate sampling from a true 


normal population. "m 
The mean, variance and standard deviation of the scores are 


и = 121.6; c? = 1380.4; c = 37.15 


In Table XXIV the entries in the first column are serial num- 
bers identifying the subjects. Each serial number has 3 digits 
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if zeros are inserted for missing digits in the tens or hundreds 
place, so that 9 is written 009 and 21 is written 021. 

These scores in Table XXIV are now to be considered as de- 
fining a population from which random samples are to be drawn 
by means of a table of random numbers. If the same number 
should be drawn more than once, the individual represented by 
that serial number should be included as often as his number is 
drawn. This procedure is called sampling with replacement and 
has the effect of transforming the finite population of 447 cases 
into an infinite population with the same percentage distribution, 

Use of Table of Random Sampling Numbers. Several lists of 
random numbers are available.* Appendix Table XXIII is a 
part of the Table of 105,000 Random Decimal Digits published 
by the Interstate Commerce Commission. 

Suppose, for example, that it is desired to choose 6 cases at 
random out of a set of 50 cases. As 50 has 2 digits, we shall read 
2-digit numbers from the table disregarding all those larger than 
50. The first 6 numbers read will identify the 6 individuals se- 
lected. We should decide, preferably in a random manner, where 
to enter the table before we look at it, lest after we see the page 
we may be influenced by a possible desire to include or to exclude 
a particular individual. 

И we should decide to begin with block 1 and line 1, using the 
first 2 digits, and to move downward across the table, the 6 in- 
dividuals chosen would be those having code numbers 10, 22, 24, 
42, 37, and 28. Note that 4 numbers have been passed over be- 
cause they are larger than 50. If we should decide to begin in 
block 9 and row 50, using the first 2 digits and moving upward 


* Tippet, L. H. C., Random Sampling Numbers with foreword by Karl Pearson, 
Cambridge University Press, 1927. 
Fisher, R. A. and Yates, F., Table XXXIII in Statistical Tables for Biological, Agricul- 
tural and Medical Research, Oliver and Boyd, London, 1938. 
Interstate Commerce Commission, Bureau of Transport, Economics and Statistics, 
Table of 105,000 Random Decimal Digits, Statement No. 4914, File No. 261-A-1, 
Washington, D.C., May 1949, 29 pp. 
Kendall, M. G. and Smith, Babington B., Tables of Random Sampling Numbers, Tracts 
for Computers, No. 24, Cambridge University Press, 
Snedecor, George W., Statistical Methods, Ames, Iowa, Iowa State College Press, 1946, 
pp. 10-13. 
The general criteria that a set of such numbers must meet in order to be useful in 
sampling have been stated by Kendall and Smith in “Randomness and Random Sam- 
pling Numbers,” Journal of the Royal Statistical Society, 101 (1938), pp. 147-166. 
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across the table, the individuals chosen would be those with code 
numbers 27, 2, 39, 82, 4, and 44. 

It is permissible to begin at any point in the table, and to move 
in any direction which is convenient. The variety of ways of 
selecting digits from this table is enormous. Obviously a simple 
method by which numbers can be easily identified is preferable to 
a complicated method. 

A Class Project in Drawing Samples, Computing Certain 
Statistics and Obtaining the Random Sampling Distributions of 
those Statistics. If this project is undertaken by any group of 
persons it is essential that (a) all samples be drawn at random, 
(b) all samples be of the same size, (c) any two samples be inde- 
pendent. Unless these conditions are observed, the statistics can- 
not be expected to conform to the pattern of the theoretically 
established sampling distribution. Condition (c) will require 
some planning if a table of random sampling numbers is employed, 
to provide that no two persons use the table in quite the same 
way. The steps listed below have been carried out and results 
recorded in Specimen Worksheets. The student will find it con- 
venient to prepare similar forms in advance. 

In the directions set down here the sample size has been kept 
small for several reasons. (1) As sample size increases, the distri- 
butions of many statistics become more like the normal curve. The 
difference in form of the various curves is more dramatically por- 
trayed when N is small. (2) Drawing a number of small samples 
and combining them into one larger sample will help to make 
vivid the change which takes place in the sampling distribution. 
of a statistic as N increases. (3) For a given expenditure of time 
the student will learn more by computing a variety of statistics 
for each of a few small samples than by computing fewer statistics 
for larger samples. 

The steps in computing the required statistics are as follows: 

1. From a table of random numbers draw 5 numbers of 447 
or smaller and enter them in the first row of Worksheet I in the 
column for Sample I. Similarly draw and record 5 serial numbers 
for each of the other three samples. 

2. Treating each number obtained in Step 1 as a serial number, 
read from Table XXIV the score of the subject that has that 
serial number and record it in row 2 of Worksheet I. ; 

3З. Саггу out the computations indicated in rows-3 through 
11 of Worksheet I and check in the manner indieated at the 
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SPECIMEN WORKSHEET 1 


Sample of 5 беш 
г pie о posite 
Operation Sampe 
1 2 3 4 of 20 
р , 018, 074, | 191, 296, | 088, 015, | 092, 414, 
1. = of serial numbers зов 328, | 117, 420, | 284,071, | 171, 102, 
Eun 108 395 122 392 
127, 165, | 170, 157, | 169, 137, | 180, 78, 
2. Corresponding scores, X;; | 47, 103, | 105, 51, | 123, 104, | 152, 142, 
91 40 98 58 
a с=с — = 
3. Sum of scores Y X 533 |523 | 631 610 |2297 
1 
1 N 
4. Mean xD X-X 1066 |1046 |1262  |122 114.85 
1 
5. Sum of squares оѓ scores 
ES 64453 |68775 |82879 [|85116 |301223 
1 
6. Correction term 
N 2 к 
( x) /N 56817.8 |54705.8 | 79632.27 | 74420  |263810.45 
1 
N N 2 
7. = Xx- (5 х) /N| 7635.2 | 14069.2 | 3246.8 |10696 37412.55 « 
1 1 
"i аб: 
8. è = у а/(М— 1) 1908.8 |3517.3 |8117 |2674  |1969.08 
1 
9. а= ХХ... — 8.25 | - 10.25 |1135 |715 
10. Nd? 340.3125 | 525.3125 | 644.1125 | 255.6125 
ll, з= Vä 43.7 59.3 — [ms 51.7 44.4 
Checks: Row 3 533 + 523 + 631 + 610 = 2297 
Row 4 4(106.6 + 104.6 + 126.2 + 122) = 114.85 
Row 5 64453 + 68775 + 82879 + 85116 = 301223 
Row7 7635.2 + 14069.2 + 3246.8 + 10696 = 35647.2 
Row 10 340.3125 + 525.3125 + 644.1125 + 255.6125 = 1765.35 


Row 7, Entry in last column 


= 37412.55 


bottom of the Worksheet. The mean (X), the variance (s?), and 
the standard deviation (s) have now been obtained for each small 
sample of 5 and for the composite sample of 20. 

4. Enter the computed statistics on Worksheet II and com- 
, plete the additional computations suggested there in rows 1 to 10. 


| 
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Record of Statistics Computed from Samples 


Statistic 


SE 


e 


x 


БЕ. 
3. ¥ — p = X — 121.6 
.X-,y2X-1216 


о 7 у= (۴ - 1 


> 


ЖҮН / <a = (X - 121.6)/831 


— 17.0 |46 |4 


(Х-и) VE = (X — 121.0) V5/s 


oo 


© 


a= [7 = (X — 121.0) 30/5 


. Ex?/g? = 222/1380.4 
. Ezt/g* = 1/13804 


© 


Sample of 5 


2 3 
104.6 [126.2 | 122 


5.53 |1019 | 2.35 [7.75 


5. On Worksheet III, enter the following statistics: 


Line 1 The difference of two means taken from line 4. 


Thus X, — X, = 106.6 — 104.6 = 2.0 
Line 2 The sum of 2 entries on line 7 of I 
Line 3 Computation based on entry in preceding line 
Line 4 Entry in line 1 divided by entry in line 3 
Line 5 Ratio of two variances taken from line 8 of I. 

Thus 1908.8/3517.3 = .54 


SPECIMEN WORKSHEET Ill 


Record of Statistics for Comparison of Samples 


Statistic Samples compared 

2,3 3,4 

1. X; - X; = 21.6 | 42 
2. Zz* + Za 21704.4 | 17316.0 | 139428 


2. а БУ оф 
2 

3. ax y EE 329 |294 | 264 

nuc чыке ы E 


4. (X; = Х)/зуу_у, 


5. 82/8? 


—.73 | .16 
4.33 30 
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We shall now consider some general issues before examining the 
empirical distributions which have been produced by classes carry- 
ing out the sampling project which has been described above, 

Samples from a Normal Population. A large part of this book 
is concerned with the theory of samples drawn from a normal 
population. The sampling distributions which will be used are, 
therefore, based on this assumption. However it should be recog- 
nized that empirical studies of samples from non-normal popula- 
tions indicate that a considerable departure from normality does 
not invalidate the methods described. 

We shall, in this chapter, describe four of the most useful 
smooth sampling distributions of statistics calculated for samples 
from a normal population. Before proceeding with the descrip- 
tion of these distributions it is desirable to clarify the concepts of 
independence among statistics and of degrees of freedom. 

Independence of Statistics. An unexpected property of sta- 
tistics calculated for samples from a normal population is that 
certain pairs of statistics based on the same observations are in- 
dependent of each other. This property does not hold for all 
statistics but it does hold for the mean and the variance and also 
for the mean and the standard deviation. It does not hold even 
for these statistics if the population is not normal. 

When the mean and the standard deviation are both calcu- 
lated for each of many samples, they vary independently of each 
other. This principle is proved mathematically in treatises on 
the mathematical theory of statistics. In this text the independ- 
ence in the variation of the mean and the standard deviation is 
illustrated graphically in Figure 6-1 which shows the joint dis- 
tribution of these two statistics from 144 samples of 5 cases each 
drawn by students carrying out the sampling project. Without 
this independence between X and s (or s?) from the same sample 
several of the procedures about to be described would not be 
correct. It is indeed a beneficent principle which very few people 
would recognize intuitively. Statistics from two or more random 
samples are always independent regardless of the nature of the 
population which is being sampled. 

Degrees of Freedom. The concept of degrees of freedom in 
connection with x? was discussed in Chapter 4. It also arises in 
connection with observations made on a continuous scale and the 
correct choice of the number of degrees of freedom is often a very 
important part of the correct solution for a problem. 
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9 


115-119 © 
120-124 уу 


23739] | | | 
э 3436 Bee 
зз | | 


Fra. 6-1. Joint frequency distribution of mean and standard deviation from 
144 samples of 5 cases each. 


r= —.02 


If N independent observations are made, these observations 
have N degrees of freedom, since each observation is free to take 
any value in the possible range of values. If those'N observations 
are from a normal population and if from them a mean and a 
standard deviation (or a variance) are both computed, then these 
N degrees of freedom are partitioned so that the distribution of 
the mean has 1 degree of freedom and the distribution of the 
standard deviation (or the variance) has N — 1 degrees of free- 
dom. In succeeding chapters the N degrees of freedom of N 
independent observations will be partitioned in various ways 
among independent statistics computed from those observations. 
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Such partitioning of degrees of freedom is possible only when 
statistics possess independence. : 

Five Important Distributions. While the number of forms of 
sampling distribution is almost endless, there are five forms (bie 
nomial, normal, chi-square, “Student’s,” and F distribution) which 
are used so often that they are tabulated in practically every text- 
book dealing with statistical inference. Three of these, the bi- 
nomial, the normal, and the chi-square distribution, have been. 
used in preceding chapters of this book. The binomial has already 
been considered at some length and in this chapter we shall take 
a preliminary look at the other four. Before discussing these dis- 
tributions in any detail, it will be helpful to give a general i 
of the type of statistie which gives rise to each of them. | 

Statistics Which Have a Normal Distribution. When the dis- 
tribution of the parent population is normal it can be proved. 
mathematically that the sampling distribution of certain statisties 
is also normal. Among such statistics are the mean of a samp! 
and the difference between the means of two samples. It will b 
helpful to give a general description of such normally distribu 
statistics and then to show how certain familiar statistics can be 
recognized as belonging to this general class. 

If X; represents a random observation from a normal рор 
tion and С; is a constant, then X; + C;, X; — Ci, С.Х, and Хб, 
are all normally distributed. Also if X; and X; are independent 
random observations from a normal population, then X; + X; and 
Х;— X; are both normally distributed. Also the more general 
expression 


(6.1) Co + С,Х, + СХ +--+ + CyXy 


represents a normally distributed variable if the C’s are constants 
and the X's are independent random observations from a normal | 
population. Since each term in (6.1) is of first degree in X that 
expression is called a linear function of X or a. linear function of 
the observations. 2 

The mean of a sample can be obtained from (6.1) by letting Co 
equal 0 and letting Cı, C; - - - Cy each equal 1/N. Hence the mean 
of a random sample from a normal population is normally dis- 
tributed. i 

An interesting and important fact is that even if the popula- 
tion is not normally distributed, the sampling distribution of the 
mean may often be well approximated by the normal distribution | 
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if samples are large (say N > 30). Under such circumstances, 
the distribution of the mean can be treated as normal with mean 
и and standard error о/м. 

If X is normally distributed, the statistic X — и, obtained from 
X by subtracting a constant, is also normally distributed with 
mean 0 and standard error ¢/VN. Then the statistic 


X =u (Х- ШУ 
o/VN т 


has the unit normal distribution. In Chapter 7 a normal distribu- 
tion related to the difference between the means of two samples 
will be considered. In Chapter 10 a regression coefficient will be 
treated as a linear function of the observations and as therefore 
having a normal distribution. 

Statistics Which Have a Chi-square Distribution. If X; is 
normally distributed then certain statistics of the form 


(6) (0, — Xy +(X- Xy +--+ + Cy(Xw - Xy 
have the chi-square distribution. Here the C's are constants and 
deviations from the mean are squared. 
A very important statistic of this type is one in which all the 
C's have the same value, 1/c?. This statistic 
Z(X,- X) Ж (Niza se 
2 


а? с 
will be discussed in Chapter 8. It has N — 1 degrees of freedom. 
When the subject of analysis of variance is discussed, the X’s will 
often be replaced by the means of subsamples. 


Е)? ар? 
The statistic 2 ше: treated in Chapter 4 is similar to (6.2) 


because for large samples (f; = F;) has a distribution which is 
approximately normal with mean zero, like that of (X — X, and 
C; may be.takem as 1/F;. Consequently the chi-square distribu- 
tion was found to be a satisfactory approximation to the distri- ` 
bution of this statistic unless the number of observations is small. 
Several classes of students in a statistics course have drawn 
random samples of 5 cases each from the data of Table XXIV 
as described at the beginning of this chapter. The statistic 
У(Х — Х)2/0° was computed from each of 125 of these samples 
and the results recorded in Table 6.1. Figure 6-2 shows the dis- 
tribution of these empirical values with the theoretical x? distri- 
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bution for 4 degrees of freedom superimposed. Read a few values 
from Table VIII, and mark their position on the baseline of the graph, 
as for example x? = 1.1, 


E X55 = 3.36 and x2 = 7.8. 

f 30 From each of the random 

38 samples you have yourself 

= 06 drawn compute the same 

ee statistic and mark its posi- 
3i tion on the baseline. 

012345267 8 9 101112 If the same statistic were 

Value of X ? < 


computed for each of the 
Fic. 6-2. Distribution of E(X — Х)2/02 component samples of 20 

in 125 samples of 5 cases each, and theoreti- 3 x. $ 
cal x? distribution with 4 degrees of freedom. — €88e8, its distribution would 
have 19 degrees of freedom 


instead of 4 and would be considerably less skewed than the one 
illustrated in Figure 6-2. 


TABLE 6.1 Distribution of E(X — X)*/a* in 125 Samples of 5 Cases Each 


inn 2 3 Рац 2 
CIS s M CS , а 
17.2% 1 100% 57 6 744 
16.7 1 992 5.2 6 69.6 
16.2 98.4 47 10 648 
157 1 984 42 8 568 
15.2 97.6 2 3.7 т 504 
147 97.6 3 3.2 4 448 
14.2 1 97.6 2 27 16 41.6 
13.7 96.8 4 2.2 8 288 
13.2 96.8 2 1.7 11 224 
12.7 1 968 5 1.2 7 13.6 
12.2 1 96.0 4 7 7 80 
2 2 E 


* Middle of the interval. 


The cumulative x? distribution with 4 degrees of freedom is 
drawn in Figure 6-3 and the points determined by the cumulative 
percents from the empirical distribution of Table 6.1 are drawn 
by heavy dots. The cumulative distributions show very strikingly 
the correspondence between the empirical distribution obtained 
from values computed by many different persons from random 
samples they had drawn and the x? curve plotted from a mathe- 
matical formula. 
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Cumulative Probability 


"LEE a il EET ЫЕ E 
0 1 2 3 4 5 6 ff 8 9 AON ie АЗ, D 
Value of Z (x-x) Yo? 


Fic. 6-3. Cumulative x? distribution with 4 degrees of freedom and cumulative 
percents of distribution of 125 values of Х(Х — X)?/o? from samples of 5 cases. 


Statistics Which Have ‘‘Student’s” * Distribution. In mak- 
ing inferences about the mean of a population it is necessary to 
take into account the standard deviation of that population. In 
most practical situations neither и nor с is known and their es- 
timates X and s must be used. 

In Chapter 7 it will be seen that an important statistic in 
problems about the mean is 
(6.3) Ра oe 
which has “Student’s” distribution when the observations are 
independent and normally distributed. This statistic can be 


X-u 


written as | = » where 


8 
(6.4) вт = 


is a sample estimate of the standard error с/у. In order that 
the ( ratio shall have “Student’s” distribution the numerator must 


be a normally distributed variable with mean zero and the de- 
nominator must be the square root of a variable which is distri- 


* “Student” was a well-known British statistician named William Sealy Gosset 
(1876-1947) who was adviser to the Guiness brewery in Dublin, A ruling of that firm 
forbidding their employees to publish the results of research was relaxed to allow him 
to publish mathematical and statistical research under a pseudonym. His paper on 
“The Probable Error of a Mean,” published in Biometrika in 1908, which first called 
attention to the fact that the ‘normal curve does not properly describe the distribution 
of the ratio of mean to standard error in small samples, is now a classic. 
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buted as x* independently of the numerator. To illustrate the 
difference in the distributions of жй cts and ба. 


Table 6.2 has been prepared. For 338 random samples of 5 cases 
each obtained in the sampling experiment, the statistic 


Xap -¥- 1916 
c/VN 16.61 


was computed and the results tabulated in the second column of 
Table 6.2. For 459 samples the statistic 


X-u (X-1210)v5 
s/ VN 8 


was computed and the results tabulated in the third column. A 
glance at the table is enough to reveal that the latter distribution 
has a much wider spread and that extreme deviates are more fre- 
quent. The cumulative percents for the two distributions are 
shown in columns 4 and 5 of Table 6.2. 

A cumulative normal curve has been drawn in Figure 6-4. 
(This is easily done from the figures in Table I or Table II in the 
Appendix.) The cumulative percents for the distribution of 


Poo ies d i 
ia 


Cumulative Probability 


~4.2-3.6-3.0-24-18-12°.6 0 6 12 18 24 3.0 3.6 42 48 


Fia. 8-4, Cumulative normal distribution with cumulative percents of dis- 
tribution of (Y — р) V/5/s shown by dots and cumulative percents of distribution 
of (X — р) V5/s by small crosses. 
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TABLE 6.2 Distribution of Ее and of (x= avs 
s 


from Samples of 


5 Cases Each 
А sate Percent of frequency below 
ме са upper limit of interval S 
Interval ER ER E FIRE (X= DEL a- БЫН 
7 8 
4.5 1 100.09, 
4.2 99.8 
3.9 99.8 
3.6 5 99.8 
3.3 1 98.7 
2.0 3 98.5 
2.7 6 97.8 
2.4 1 6 100.00% 96.5 
2.1 6 10 99.7 95.2 
1.8 7 7 97.6 93.0 
1.5 15 14 95.9 91.5 
1.2 17 30 91.4 88.4 
9 24 26 86.4 81.9 
6 36 40 79.3 76.3 
3 34 42 68.6 67.5 
0 57 79 58.6 58.4 
-— 3 38 49 41.7 41.2 
= 6 35 29 30.5 30.5 
- 9 16 35 20.1 24.2 
= 1.2 25 25 15.4 16.6 
— 1.5 22 12 8.0 11.1 
=т® 4 11 1.5 8.5 
ss 199 8 3 6.1 
224 5 3 44 
— 2.7 1 4 3 3.3 
— 3.0 3 ep 
—33 4 1.7 
— 3.6 1 9 
—39 2 6 
—42 кей; kb, 2 
TOTAL 338 459 
MEAN 0.007 0.016 
VARIANCE 0.84 1.63 


(X — 121.6)/16.61, listed in column 4 of Table 6.2, are represented 
by heavy dots which cluster closely around the cumulative normal 
curve. The cumulative percents for the distribution of 


(X — 121.6) V/5/s, 
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taken from column 5, are represented by small crosses. ‘These 
depart considerably from the normal curve at the ends of the dis- 
tribution, and it is the ends of the distribution which are strategic 
in most problems. 

In Chapter 7 there will be a discussion of the theoretic distri- 
bution to waich these points do conform. It is “Student's” 
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Fic. 6-5. Cumulative "Student's" distribution with 4 degrees of freedom, 
cumulative percents of distribution of (X — и) У5/о shown by dots, and cumula- 
tive percents of distribution of (X — u) V5/s by small crosses, 


distribution with N — 1 degrees of freedom, and for these samples 
N-1-4. The cumulative curve for this distribution has been 
drawn in Figure 6-5, with empirical values shown by dots and 
crosses: as before. Here the crosses representing points on the 
distribution for«(X — 121.6)V/5/s conform closely to the theore- 
tical curve and the dots do not. 

If the statistic (X — „(VN /o were computed for each sample 
of 20 cases, obtained by combining 4 samples of 5 cases, its dis- 
-tribution would also approximate the unit normal distribution 
and its cumulative distribution would differ only by chance from 
that shown by dots in Figures 6—4 and 6-5. 

However if the statistic (X — u)VN/s were computed for each 
sample of 20 cases, its distribution would be far more nearly 
normal than that obtained from the samples of 5 cases, and its 
cumulative distribution would differ considerably from that shown 
by the crosses in Figures 6—4 and 6—5. 
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The ratio of the difference between two means to the sample 
estimate of the standard error of that difference has under cértain 
circumstances "Student's" distribution. Problems employing this 
ratio will be discussed in Chapter 7. The statistic (X; — Xs) /sy,-y, 
was computed on line 4 of Worksheet III. Under the circum- 
stances in which these samples were obtained, this ratio should 


1.0 


io 


Cumulative Probability 


ok Rb» » иә м b 


Lon 


EE 
ME 
| К 
[2n 

i 2. 


0 

Value of t 

Fic. 6-6. Cumulative Student's" distribution for 8 degrees of freedom and 
cumulative percents of distribution of 1 = (X: — X:)/sy,-x, for 200 pairs of samples 
of 5 eases each. 
have Student's" distribution with 8 degrees of freedom. It has 
been computed for 200 pairs of samples and the results recorded 
in Table 6.3 and graphed in Figure 6-6. 


TABLE 6.3 Distribution of t = (X; — Х)/зу з, for 200 Pairs of Samples 
of 5 Cases Each 


———————— 


ا 
X-X: у X-X: n X-X: у‏ 
Sh- 8-2‏ .8-2 
Ll 1‏ —-13— 5 16 -14 1 41-43 
6 14 —-1.6— 12 13 -1.1 3.8-4.0 
4 1.7 —-19 = 14 10 -8 1 3.5-3.7 
3 2.0 —-2.2 = 16 4 -5 1 3.2-3.4 
5 23 —-2.5 = 16 4 -2 1 2.9-3.1 
1 2.6 —-2.8 — 35 4 -4— 2.6-2.8 
1 2.9 —-8.1 = 27 2—-4- 2.3-2.5 
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The ratios of other normally distributed statistics to the sample 
estimate of their standard errors will be considered in later chapters, 

Statistics Which Have the F Distribution. Whenever there 
are two independent estimates of c?, the ratio of one estimate to 
the other 


(6.5) = 


has a sampling distribution of the form sometimes called the F 
distribution or the variance-ratio distribution. The preceding 
statement should be qualified by the usual assumption that ob- 
servations are drawn at random from a normal universe. Some- 
times the two estimates come from independent samples. Then 
the problem may relate to a comparison of the variabilit y in two 
populations. Chapter 8 will contain problems of this nature. 
Sometimes the two estimates are independent statistics obtained 
from the same set of observations. A great variety of important 


TABLE 6.4 Distribution of F = s%/s? for 400 Pairs of Variances, Each 
Variance being Obtained from a Sample of 5 Cases 


Number Cumu- Number Cumu- Number Cumu- 
F* of lative | F* of lative | F* of lative 

samples 96 samples 96 samples % 
18.0 H 100.00 | 12.0 97.75 | 6.0 93.25 
17.7 99.75 | 11.7 1 97.75 | 5.7 5 93.25 
17.4 99.75 | 11.4 97.50 | 5.4 1 92.00 
17.1 1 99.75 | 11.1 97.50 | 5.1 2 9175 
16.8 99.50 | 10.8 1 97.50 | 48 4 91.25 
16.5 99.50 | 10.5 97.25 | 4.5 3 90.25 
16.2 1 99.50 | 10.2 1 97.25 | 4.2 6 89.50 
15.9 9925 | 9.9 97.00 | 3.9 11 88.00 
15.6 99.25 | 96 97.00 | 3.6 9 85.25 
15.3 1 99.25 | 9.3 97.00 | 3.3 7 83.00 
15.0 99.00 | 9.0 2 97.00 | 3.0 11 81.25 
147 2 99.00 | 87 96.50 | 2.7 12 78.50 
14.4 98.50 | 84 96.50 | 24 22 75.50 
14.1 1 98.50 | 8.1 1 96.50 | 2.1 17 70.00 
13.8 98.25 | 7.8 2 96.25 | L8 22 65.75 
13.5 98.25 | 7.5 1 95.75 | 1.5 27 60.25 
13.2 98.25 | 7.2 95.50 | 1.2 35 53.50 
12.9 1 98.25 | 6.9 2 95.50 .9 57 44.75 
12.6 98.00 | 6.6 4 95.00 6 76 30.50 
12.3 1 98.00 | 6.3 3 94.00 3 46 11.50 


* Values of F are the upper limits of step intervals. 
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Probability Density 


Fic. 6-7. Distribution of Е = s%/s*; for 400 pairs of samples of 5 cases each, 
and theoretical F distribution for 8 degrees of freedom. 
(7 cases at upper end of distribution are not shown on graph.) 


problems can give rise to such variances, one of the commonest 
being a test of the hypothesis that several populations have the 
same mean. This problem is discussed in Chapter 9. 

Let m represent the degrees of freedom associated with the 
numerator of Ё and n; the degrees of freedom associated with the 


1.00 ل‎ 
MENE o 008 oo E 
жш? (1га ш кшт» шщ 
EEE al m 
КЕРЛЕ ste Theoret | L 

ù 
gon ااا اا‎ 
за CM ЖШ 
tin Fes saca e EON 
кЕрЕк} 

ي 


o 
oa 
m 
m 
o 
Га 


Fic. 6-8. Cumulative percentage distribution of 400 values of F = s*;/s*j, 
and certain selected percentiles of the theoretic F distribution. 
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denominator. Then for every pair of values of n; and n: there 
exists a distinct distribution. Since s*, and з°% are both positive 
their ratio is positive and values of F extend from 0 indefinitely 
in the positive direction. 

In the sampling experiment, each of many statistics students 
divided the variance of the first sample he drew by the variance 
of the second, obtaining a value of F. In Table 6.4 are presented 
400 such sample values of F. These are shown graphically in 
Figure 6-7 together with the theoretic F distribution. Certain 
percentile values are indicated along the baseline in case you wish 
to compute Р ratios from your own data and compare them with 
this distribution. The cumulative distribution from the empirical 
data of Table 6.4 has been plotted in Figure 6-8. Small circles 
indicate the values of selected percentiles from the theoretic F 
distribution. 


7 Inferences Concerning the Mean or 
the Difference Between 


Two Means 


In this chapter, methods will be described for testing 
hypotheses regarding и and for calculating estimates of u. In 
addition, methods for comparing the means of two populations, 
ш and иг, will be considered. For these purposes the normal dis- 
tribution and "Student's" * distribution will be the required 
sampling distributions. 

The Assumption of a Normal Population. In this chapter 
and in most of the following chapters the assumption will be made 
that measures in a sample are independently drawn from a normal 
population. This assumption is justified by the approximately 
normal shape assumed by many empirical distributions. Studies 
indieate that some departure from normality does not invalidate 
the methods to be described. 

Where the assumption of normality or of independence is not 
made this fact will be indicated. Where it seems important to 
state explicitly the assumption of normality it will be stated. 

The Central Limit Theorem. A theorem of far-reaching 
importance in statistics is called the Central Limit Theorem. 
It states that for a wide variety of populations statistics based on 
large random samples are distributed normally. This applies to 
nearly all populations which are likely to be considered in practice, 
and to most statistics considered in this book. This is especially 
true of statistics to be considered in this chapter. It is less true of 
the correlation coefficient to be considered in Chapter 10, and of the 
sample proportion p when P is near 1 or 0, because their distribu- 
tions depart greatly from normality unless the sample is exception- 
ally large. Also if classes are few, distributions of x* and F ap- 
proach smooth x?-distribution, not normal, as sample size increases. 


* See footnote on page 135. 
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X as an Estimate of p. It was stated in Chapter 5 that 
E(X) - and that, therefore, X is an unbiased estimate of д. 
It is easy to see that X is also a consistent estimate of и, for 

c 

(7.1) с VN 

and so ex decreases as N increases. For large samples X can be 
considered as normally distributed about p with standard devia- 
tion ту even if the population has a non-normal distribution, 
Even for relatively small samples out of a non-normal population 
the mean has a distribution which is approximately normal. Since 
сх сап be made arbitrarily small by making N sufficiently large, 
the probability that X deviates from д by а given amount can 
also be made arbitrarily small. The argument for the consistency 
of X corresponds to the argument for the consistency of p given 
on page 52. 

There are many other possible methods of estimating u, such 
as using the median, or the mode, or the average of the 25th and 
75th percentiles, or the average of the 7th and 93rd percentiles, 
or the midpoint of the range, ete. In samples from a normal 
population every other estimate of u has a standard error larger than 
ox. Since the mean is the estimate of u which has the smallest 
standard error in samples from a normal population, it is called 
the efficient estimate of u for such a population. When the popula- 
tion is not normal some other estimate such as the median may 
have a smaller standard error than the mean and so be more 
efficient. Thus X provides an estimate of и which is unbiased, 
consistent, and efficient when the population is normal. 


EXERCISE 7.1 

1. For each of the given values of N and т, find the value of ey. Make 
а table of results and notice the way in which ex decreases as N increases, 
с remaining constant. Notice also the way in which ох increases as 
c increases, N remaining constant. 


& 0-5 N =4 d.c-5,N-1000 g. o=10, N=25 
b. o = 5, N = 25 е. с= 2, М = 25 h. с = 20, N = 25 
с. ¢ =5, N = 100 fie HN = 


2. Suppose in a population д = 48.5 and о? = 15.4. What is the 
probability that the mean of a sample of 20 cases will be larger than 50? 
Between 47 and 50? Answer the same questions if N = 100. 

Tests About u When c Is Known. In practice this situation 
is not very common for usually if и is unknown c is also unknown. 
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However it is the simplest of the situations to be considered and 
a natural introduction to the others. The population of 447 cases 
used in Chapter 6 provides a good illustration. Here u = 121.6 
and o = 37.15. Suppose a student reports obtaining a sample 
of 5 cases in which X = 165.4. This seems very large. If his 
computations are not available for examination shall it be as- 
sumed that he has made an error? 
Since the population variance is known, ох = ¢/VN and 


(7.2) „= VN(X- и) 
a/VN c 
Applying Formula (7.2) to the reported data we have 


2 . Vb (165.4 — 121.6) 
hy odi е 


Tables of normal probability indicate that the area outside the 
two ordinates at z = + 2.64 is .008. The hypothesis that 165.4 
is a purely random deviate from the expected value of 121.6 is 
scarcely tenable and it would be wise to examine the student’s 
work to see whether he has misunderstood the procedure or made 
a mistake in computation. However it must be noted that if a 
thousand samples are drawn about 8 of them might be expected 
to have |z| > 2.64 and one should not be too confident that in 
any particular sample a large departure from the expected value 
is due to a mistake. 

“Student’s” Distribution. When c is not known, ey must be 
estimated from sample data by the formula 


(7.3) sx = VN 


= 2.64 


If observations are drawn from a normal population, the ratio 
T И has what is called “Student's” Distribution, and is cus- 
x 


tomarily denoted by the letter ¢ to distinguish it from z which 
customarily denotes a variable with unit normal distribution. 
» X-u _ vVN(X-) 
s/VN 8 
“Student’s” distribution is sometimes called the t-distribution. 
‘Because this chapter presents the first situation in which “Stu- 
dent’s” distribution is needed, it is necessary now to pause for a 
discussion of the form of that distribution and the tables available. 


(7.4) 
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“Student’s” distribution is symmetrical with maximum ordi- 
nate at (— 0. Its shape changes as the number of degrees of 
freedom changes. Therefore “the” curve is in reality a whole 
family of curves. Аз increases, the curve approaches the normal 
form so that for » as large as 30 or 40 the normal probability 
table can be used satisfactorily in most problems, By caleulus 
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Fic. 7-1. The normal ee Fic. 7-2. Cumulative normal distri- 
bution (N) and “Student’s” dis- bution (V) and cumulative curve of 
tribution with 1 degree of freedom — "Student's" distribution with 1 degree 
(S). of freedom (S). 


it can be shown that the formula for the normal curve is the limit 
approached by the formula for “Student’s” distribution as n 
becomes very large. When 7 is very small the area in the tails of 
"Student's" distribution is considerably greater than the cor- 
responding area in the tails of the normal distribution, as can be 
seen in Figures 7-1 and 7-2 where the normal distribution and 
"Student's" distribution with n = 1 are shown together. The 
same relation has already been seen in Figures 6—4 and 6-5. 

In Table IX to be found in the Appendix, each row belongs to 
a different probability curve. These curves are distinguished by 
the number of degrees of freedom specified in the column at thé 
left. The tabular entries are values of ¢. In the column headed 
1s each entry is the 90th percentile of the probability distribu- 
tion to whieh it belongs. A similar description applies to the other 
columns. Notice that when the subscript for ¢ is less than .50 the 
value of ¢ is negative. Thus tı = —£go toos = — Los, etc. The 
entries in the last row of the table are percentile values for the 
normal curve. 


EXERCISE 7.2 
1. Study the following statements, translate them into words. Sketch- 
ing a probability curve may help your understanding. 


P(t < 1м) = .95 Р@ м < t < 1м) = .92 
P( < to) = .01 P(tos < t < ty) = 84 
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P(t > ім) = 01 P(t > is ort < to) = .02 
P(t > tor) = .93 P(t > 15 ort <ta) = .10 


2. Examine Table ГХ to identify the entries which support the fol- 
lowing statements: 


P(t < 1.4 |n = 8) = .90 P(t > — 1.34 | n = 15) = .90 

P(t < — 2.31 |n = 8) = .025 Р(— 1.37 < t < 137 |n = 10) = .80 
P(t < – 1.35 |n = 13) = 10 Р(– 2.18 < i < 2.18 | n = 12) = .95 
P(t > 3.75 |n = 4) = .01 P(- 3.00 < t < 3.00 | n = 7) = .98 


3. Verify the following statements by reference to Table ІХ 


a. For n= 11, P(t > 2.2) 025 
P(t < 22) 975 
Р(— 2.2 <t < 2.2) = .95 
b. If ordinates are to be drawn to cut off the extreme 1% of the area 
in each tail of the curve 
for n = 5, they should be drawn at — 3.36 and + 3.36 
for n = 10, they should be drawn at — 2.76 and + 2.76 
for n = 25, they should be drawn at — 2.48 and + 2.48 
for the normal curve, they should be drawn at — 2.326 and + 2.326 
c. As the eye follows down any selected column of the table, the 
entries grow smaller as n increases, and they appear to converge 
upon the entries in the last row of the table. 
d. The entries in the row headed л = œ are identical with cor- 
responding entries in Table II. 

4. If ordinates are to be drawn to cut off the extreme 10% of the area 
under the curve, that is the extreme 5% in each tail, where should they 
be drawn if 

а. п= 3 b. п = 16 с. п = 500 

Б. If ordinates аге to be drawn to enclose the middle 80% of the area 

under the curve, where should they be drawn if 
а. п= 1 b. n = 10 с. п = 400 

6. If an observed value of ¢ has been found to be 2.8, what is the best 

reading which Table IX furnishes for P(¢ < — 2.8 or t > 2.8) when 


а. п= 1 b.n = 3 с. n = 500 


Tests About и when с Is Unknown. Suppose a sample of 
6 cases has X = 12.3 and 8° = 4.8. It is desired to test the hy- 
pothesis и = 15 with a two-sided test at level a = .05. Тһе sta- . 
tistic sy = s//N must be used. 


1-Х - AVN 
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will have "Student's" distribution with 5 degrees of freedom 
Substituting the given data we have 


. (123 - 15) V6 _ 
V4.8 


To complete the test of significance the observed value о 
t= — 3.02 must be compared with tabulated entries in Table IX 
where the critical region for the two-sided test with « = „05 
seen to be ¿< ~ 2.57 and ( >2.57. The observed value falls i 
this critical region and the pores is rejected. 

Confidence Interval for u.* Formulas for calculating con 
fidence intervals for u will be developed both for the situatio 
when c is known and when it is unknown. The confidence coef 
cient will be taken as 1 ~ а where а is some small value such a 
:01 or .05, so that 1 — a is .99 or .95. ; 

Consider first the situation when с is known. Then the fo 
lowing relation holds: 


(7.5) P(a,< FAVA <a) -1-« 


when X is normally distributed or when X is not normally dis- 
tributed but N is large. Simple algebraic manipulation on th 
inequality in parentheses yields another expression with the san 
probability: 


(7.6) P(X +a <u< T+) =1 a 


The inequality within the parentheses provides confiden 
limits for u which are appropriate under the conditions stated 
namely that X is normally distributed and о is known. Writter 
with the symbol X occurring in the limits of the interval, it 
readily understood that д is a fixed, though usually unknown, 
value and that it is the limits of the interval which are subject t 
sampling variation. It should be already understood that thi 
probability given in (7.6) is the probability that' X for a 
domly selected sample will simultaneously satisfy the two 
equalities 


— 3.02 


E 


Yan n and +279 > д. 


* Before studying this section, the reader may find it profitable to reread 
discussion on confidence intervals in Chapter 3. 
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After а sample has been drawn, X computed from it, and the two 
limits obtained in numerical form, such as, вау, 17.3 and 22,5, 
then the inequality 17.3 < д < 22.5 is either true or not true, The 
probability of (7.6) does not apply toit. However the probability 
statement of (7.6) provides us with a measurable degree of con- 
fidence that such a pair of values obtained from a sample will 
contain д in, say, 95% of all determinations, We may express 
that degree of confidence by such a statement as 


C73 < и < 22,5) = .95 


When ø is unknown, a formula analogous to (7.6) is written with 
s in place of e, and t in place of z: 


(7.7) РО I PETIT. LP 
MUN rN 


The number of degrees of freedom associated with £ is N = 1. 

To help the reader develop a more vivid concept of the fluc- 
tuation of the interval limits from sample to sample, data from 
the four samples enumerated on. Worksheet I in Chapter 6 have 
been used to compute four confidence intervals with confidence 
coefficient .95. These interval estimates are shown in the last 
column of Table 7.1 under the heading 1 ~ а = .95. In this case 
и is known to be 121.6 and one can see that it is contained in all 
four intervals. In the same table are also shown intervals with 
confidence coefficient .50 made from the same four samples and it 
may be noted that three of these contain и and one does not. 
Intervals with confidence coefficients .95, .90 and .50 from Table 
7.1 and intervals with confidence coefficient .10 not in that table 
are shown graphically in Figure 7-3, In each group the four 
unbroken lines represent intervals obtained from the four samples 
with N = 5 and the dotted line represents an interval obtained 
from the composite sample with N = 20, The small tick in the 
middle of each line represents the value of Х, Examining this 
graph one understands why the confidence coefficient is in prac- 
tice never taken as а small number such as .10 or even „50, 

Examination of Table 7.1 and Figure 7-3 leads to the follow- 
ing interpretations: 

1. И М and à are fixed, the narrower the interval the loss assur- 
ance there is that the population value is contained within it. 
(Look at the four intervals for any given sample.) 

2. И N and the confidence coefficient are fixed, the smaller 
the value of а the narrower will the interval Бе, (Look at the four 
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unbroken lines in any one group, remembering that for these 


samples 35 € 8: < 84 € 82.) 
è 
1-a=.10 1-a=.50 1—a-.90 1-a2.95 
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Fig. 7-3. Interval estimates of u made with confidence coefficients of 10, 
.50, .90, and .95 from each of the four samples of 5 cases and from the composite 
sample of 20 cases. (Samples as given on Worksheet I.) 


Dotted line is for composite sample. Horizontal tic is mean of sample. 


3. As N increases, if the confidence coefficient is fixed, the 


interval becomes narrower. (In any one group compare the dotted 
line for N = 20 with four unbroken lines for N = 5.) 

4. Samples of 5 cases provide very unreliable estimates. (Note 
that when there is high confidence that the population value lies 
in the interval the intervals are wide and that when intervals are 
narrow there can be little confidence that the population value 
lies in them.) 


———— —— X 
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5. The confidence interval for the mean д is symmetrically 
placed around the observed mean X. (The confidence interval 
for the population variance discussed in the next chapter is not 
symmetrically placed around the observed variance.) 


TABLE 7.1 Interval Estimates with Confidence Coefficients of .50, .90, and .95 
Made for the Population Mean и from the Data of the Four Samples of 5 Cases 
and of the Composite Sample of 20 Cases on Worksheet 1. 


" VN Interval estimate 

1-a= 50 1-a=9 “1-a=.95 
106.6 43.7 19.5 92.2<p<121.0 650<и<148.2 52.5<p<160.7 
104.6 59.3 26.5 85.0<u<124.2 48.1<p<161.1 310«4«178.1 
126.2 285 127  1170<и$1354 99.1<p<153.2 90.9 «4 «161.5 
1220 517 231 104.9<р<139.1 728«4«17L.2 57.9<и<186.1 
C 
BUT 11485 444 992 1080<и<1217 9774441320 942«4«135.5 
_ite AE зла аа 


Sample X 8 


Pon 


6. If the interval were narrowed to the single point p = X, 
its confidence coefficient would be zero. If a confidence coefficient 
of 1.00 were required, the interval would have to be infinitely wide. 


EXERCISE 7.3 

1. From the samples you obtained in the sampling experiment of 
Chapter 6 compute a set of interval estimates by Formula (7.7) and draw 
lines on Figure 7-3 to represent these intervals. You may wish to use the 
same confidence coefficients employed there and thus increase the number 
of lines in each group, or to use another coefficient, say .80. 

_ 2. By substituting observed values of X and и and values read from 
tables of "Student's" distribution, verify the interval estimates in Table 
7.1 for confidence coefficients of .95 and .90. 

3. What proportion of all samples is expected to yield interval estimates 
that contain иі 1 — а = .95? Of the 4 interval estimates made from the 
4 observed samples how many did contain и? Answer the same question 
for 1 — « = .90, .50, and .10. 

4. If a sample of 40 cases has a mean X = 27.2 and a variance г? = 13.6, 
what interval estimate can be made for и at the .99 level of confidence? 
At the .95 level? 

Mean of a Population of Differences between Two Measures 
for Each Individual. This is a very common situation in research. 
Sometimes each individual is measured at the beginning and again 
at the end of an experiment and the gain is then treated as the 
basic measure to be studied. Sometimes two subjects are paired 
in such a way that the pair constitutes one individual and the 
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difference between scores for members of the pair becomes the 
basic measure. An illustration may be taken from Wrightstone’s 
Appraisal of Newer Elementary School Practices. In each of six 
communities he selected two schools one of which he classified 
as “progressive” and one as “conservative.” A test of knowledge 
of current affairs was given to children in each school and the 
school mean computed, with results as shown here. X, and X, 

' indicate respectively the score of a “progressive” and of a “con- 
servative” school, score for the school being the mean of the scores 
of all the children studied. 


Community 9 Xe D=X,-X, D? 
A 32.7 28.2 4.5 20.25 
B 33.1 29.1 4.0 16.00 
С 35.2 31.9 3.3 10.89 
D 36.0 33.1 2.9 8.41 
E 36.2 31.4 4.8 23.04 
F 39.1 .353 3.8 14.44 

Toran 212.3 189.0 23.3 93.03 


We must consider a population of communities in each of 
which there is a “progressive” and a “conservative” school for 
which a difference score D can be obtained. Such a population is 
conceptual. It would be an enormously difficult task to enumer- 
ate all the communities in this population and take a sample 
from them. We are interested in the hypothesis that for this 
conceptual population the expectation of all such differences is 
zero, up = 0. To test it we shall need the observed mean and 
standard deviation D and sp for the six communities in the sample 
and the standard error of D: 


7 2D 
(7.8) B= 
,. NZD? - (Хр) 
(7.9) , 8р“ = О р) 
3 - îz - (EP - CDY} 
(7.10) SD = Vz UNXN-1). T. 1) 
хр 
pua dos 
s5 /NZD:- (SD; 


N*(N — 1) 


Mean of a Population of Differences · 153 


which can be more easily computed by the equivalent formula 


(7.11) еы сс” DS S 
/NZD:- (2D)? 
parece: cad 


23.3 


By Formula (7.11) i= ——— — — —— = 
|9305) — (23.3)? 
5 


We find in Table IX on the row for which n = 5 that £o = 6.86, 
and the observed value of t is very much larger than this tabulated 
value. Consequently it is almost inconceivable (though not im- 
possible) that the population difference is zero. The statistical 
analysis does not indicate the cause of the difference but indicates 
strongly that the difference is positive and should not be attrib- 
uted to chance and that therefore it has meaning, is significant. 
Naturally a research worker will examine his procedures carefully 
to see if any influence other than the one he is studying could have 
crept into his data and will not assume a significant difference to 
be due to the experimental factor unless there seems no other 
reasonable way to account for it. 

The quotient ¢ = D/sp has "Student's" distribution with 
М — 1 = 5 degrees of freedom. Hence ап interval estimate at 
confidence level .99 may be made for the population difference by 
the general plan of Formula (7.7) as follows: 


3.88 — (4.032) (.291) < u < 3.88 + (4.032) (.291) 
2.71 < p< 5.05 


Therefore the mean advantage enjoyed by the “progressive” 
school over the “conservative” with respect to knowledge of cur- 
rent affairs on the part of the pupils may be estimated at some- 
where between 2.7 and 5.1 points. 

Authors do not usually publish the raw scores on which their 
computations are based, although such data would often be of 
great value to other persons working in the same field. Suppose 
a published study gives the mean and variance of a sample at the 
beginning and at the end of an experimental period and also gives 
the correlation coefficient between scores at the beginning and 
at the end but does not give individual scores. The differences 
D; = Xi; — Xs; cannot be computed as was done for the Wright- 

. stone data and so their variance cannot be computed directly. 
However the mean and variance of these differences can be ob- 


154 - The Mean or the Difference Between Two Means 


tained from the mean and variance of each set of scores and the 
correlation * between them, as 


(7.12) D-X.-X, 
(7.13) 8р? = 81° — 2712818 + 852 


An illustration may be taken from a study by Е. L. Westover. 
This is a study of three methods of improving reading speed and 
comprehension among college students. For our present purposes 
we shall deal with one small aspect of this study only. For the 
45 students who were taught by the method of controlled eye 
movements the mean and standard deviation on an initial test 
of rate of reading were as given on page 42 of Westover’s study; 


x 8 
X; = Initial test 39.20 5.97 
X» = Final test 43.07 7.85 
D = Gain 73.87 


The coefficient of correlation between the tests was ть = .51, 
Then s»? = (5.97)? + (7.85)? — 2(.51)(5.97) (7.85) = 49.46 
49.46 


and sp = a" = 1.099 
3.87 
а t= = 3.7 
99 V1.099 


There can be little doubt that use of this method of instruction 
results in improved reading when applied to individuals like those 
in this study. 

Test of Hypotheses About ш — us. A. When с, and o> are 
Known. Assume that a sample of №, cases has been drawn from 
a normal population with mean ш and variance о°,, and another 
independent sample of №, cases has been drawn from a normal 
population with mean иг and variance ®. The two sample means 
X; and X. are distributed with means ш and u and standard 
deviations o,/ VN and o2/VN. The mean difference Y, — X; is 


distributed normally with mean ду = us and variance 
0 оз? 
14 رکچ‎ = зт + 
(7.14) OX -Xe N Ф N: 


* Readers who have not become at least superficially acquainted with the cor- 
relation coefficient in an introductory course might defer this section until they have 
studied Chapter 10, or might read it now taking the concept of correlation on faith. 
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Then the quotient 
X, — X; — (m — ш) 
NES 
Mit N: 
has the unit normal distribution. 


When the hypothesis tested is that ш — u» = 0, Formula (7.15) 
reduces to 


(7.15) 22 


Xi-X, 
NES 
Vs. tN, 
In the special case when the two populations have the same 


variance, т? = o! = о?, the variance of the mean difference be- 
comes 


(7.16) z= 


М+М 
(7.17 jin ru.) 


and the ratio used to test the null hypothesis ш — из = 0 can be 
reduced to the form 


X-X NN: 
E ee VIEN 


B. When ay and o, are Unknown but Presumed Equal. In а 
study which has as its aim the comparison of the means of two 
populations, both the means and the standard deviations must, 
ordinarily be estimated from the samples. To illustrate the type 
of data which may be obtained in such a study consider the fol- 
lowing unpublished data from scores оп а mathematics test ob- 
tained from a sample of 12 full-time women students at Barnard 
College and a sample of 11 part-time women students enrolled in 
the School of General Studies at Columbia. The means were 
respectively 82.08 and 72.64, and it is desired to test the hy- 
pothesis that the population means are equal, ui — m = 0. 

Such a test is frequently carried out by using Formula (7.16), 
substituting sample standard deviations for population standard 
deviations, and referring the results to a normal probability table. 
However this substitution produces a statistic in which the de- 
nominator as well as the numerator is subject to sampling error. 

If the sample variances are not inconsistent with the hypothesis 
that population variances are equal, the test for which will be 
described in Chapter 8, a more satisfactory procedure is to es- 
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timate this common variance and to replace c? in Formula (7.17) 
by that estimate. An unbiased estimate of c? based on data from 
2 samples is provided by 

Ni N: 

» (Xa - X)! >: (Xa – Xj: 
7 i=l i=l 

Ni-N-2 


When raw scores are available, Formula (7.19) may be conven- 
iently reduced to 


(7.19) 8? 


(7.00 8= Nit —2 


When raw scores are not available but standard deviations or 
variances are, the formula may be written as 
(Ni - Ds* + (N: – 1)s? 

Ni+N2-2 
These various formulas are algebraically equivalent but if raw 
scores are at hand it would be obviously inefficient to compute s 
and insert it in Formula (7.21) because the routine of (7.20) 
involves less rounding error. After estimating the common popu- 
lation variance by Formula (7.20) or (7.21), the obtained estimate 
is substituted for о? in Formula (7.17) to obtain an estimate of 
the variance of the difference between the two means: 


№ + № 
(7.22) sn, = (MEM) 


(7.21) = 


Then when ш — иг = 0, the formula for t becomes 


25 =; 


7.23 t = — = 
( ) \ |g Ni № 
NN: 


has "Student's" distribution with N, + М» — 2 degrees of free- 
dom. 

Now let us return to the problem concerning the comparison 
of Barnard and general studies students. For each group we need 
to know the values of N, ХХ and ZX?. These data are 


N ХХ zx: X 


Barnard group 12 985 83691 82.08 
General studies group 11 799 62149 72.64 


and this 
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Substituting these numbers in Formula (7.20) gives 


2 
83001 Ер – 099" 
8 = 2110-2 = 331.05 
From Formula (7.23) we then have 
(28208-7264 — 1, 


ү saos (sr) 


Since ¢ = 1.24 is well within the customary region of accept- 
ance, we accept the hypothesis ш = u» and say the Barnard 
students have not been shown to differ from general studies 
students in respect to scores on this arithmetic test. It must. 
however be noted that when samples are small and variability 
large, the observed difference must be very large to appear signif- 
icant. The failure to find a significant difference may be due to 
the small number of cases examined rather than to the equality 
of population means. 

C. When ai and о» are Unknown but Presumed Unequal and 
Samples are Large. If the test for equality of the variance to be 
described in Chapter 8 has been applied and indicates that it is 
not reasonable to consider the two populations as equally varia- 
ble, then éstimating a common variance as described in the pre- 
ceding section would be inappropriate. In this situation the choice 
of the method for estimating the standard error of the mean dif- 
ference depends upon the size of the samples. 

If the samples are not small, say 30 or more cases in each, the 
standard error of the mean difference may be computed as 


8 2 8 2 
(7.24) x= yt HN; 
xi IA Ү, ui (ш - дз) 
(7.25) Тһеп t WP. ай 
№, Ns 


may be referred to a table of normal probability because m is so 
large that the statistic has, a distribution approximating the 
normal. 

D. When су and оз are Unknown but Presumed Unequal and 
Samples are Small. Suppose samples of size N, and N: are drawn 
from normal populations with unknown means and variance 
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and suppose it appears unreasonable to assume that 0ı = о. (A 
test for the hypothesis ø, = c» is given in Chapter 8.) To test the 
hypothesis шу = и» set ш — д» = 0 in Formula (7.25) and the re- 
sulting statistic has “Student’s” distribution with degrees of free- 
dom given by the expression 


(7.26) n= -2 


SPI SV T 
Ni] Ni +1 NJ Ns +1 
The value of n given by this formula will seldom be an integer, 


but usually a good approximation can be obtained by using the 
nearest integer. 


Example: Suppose the sample data are 


№, = 10, X, = 35.3, s = 8, а = .05 
№ = 8, X, = 31.1, s = 24 


812 83? 
Then NI .8 and MN 3.0 
35.3 - 31.1 4.20 4 
(.8 4-3)? d LA 
(8) x GF 2 = 13.6 2 = 11.6 
11 9 


If п be taken as 11, the region of acceptance is — 2.20 < ¿< 2.20; 
if л be taken as 12, the region of acceptance is — 2.18 < t < 2.18; 
and in either case the statistic falls in the region of acceptance. 
Interpolation is unnecessary to reach a decision. 

Formula (7.26) is an approximation to a longer formula given 
by Welch in the reference at the end of this chapter. 

E. When ш and и» are Themselves Differences. Sometimes a re- 
search worker has a problem in which he must compare the dif- 
ference of two means for group A with the difference of two similar 
means for group B, the two groups being independent. 

If the difference between the two means for one group is based 
on two measures of the same individual, the difference can be 
treated in the manner described on page 151. Thus if the problem 
were to find whether children taught by Method I gained more 
than children taught by Method II during a specified period, the 
first step would be to find the gain for each child, as the difference 
between his initial and final scores. Suppose that difference is 
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called D. Then Xp and sp are computed for each group. The 
hypothesis up, — ир, = 0 is then tested by the methods described 
in sections A, B, C, or D, whichever is appropriate. 

Consider, however, the situation in which no pairing of scores 
is possible. Suppose a difference between sexes has been found 
at one age level and another difference found at another age level. 
To formalize the statement of the hypothesis, it will be assumed 
that there are four populations, one at each age level for each sex 
and that each population is normally distributed. It will be 
further assumed that each population has the same standard de- 
viation, с, but that the four means may possibly be different. 
For convenience we shall call the four populations A, B, C and D 
as in the adjacent display. Suppose now that a random sample 
is taken from each population. These samples need not be of the 
same size. For each sample the statistics ХХ and ZX? аге com- 
puted, as in the adjacent display. 


TABLE 7.2 Statistics Required for Comparison of Two 
Mean Differences 


Sample statistics 
required 


Population Population values 


. Male, aged 10 Ha, € Na, ХХА, ZX", 
. Female, aged 10 ів, © Мв, ХХв, ZX!p 
. Male, aged 16 шс. Ме, ZXc, ZX'c 
. Female, aged 16 HD, o Np, ZXp, ХХ?р 


An estimate of т? based on raw data is given by 


(7.27) 
(2X4)? _ (®Х»)* (Xo* _ (ЕХ)? 
ч EX + X's, + ХХ + TX — Wirt ane Ne Np 
E Na+ Na Nc Np — 4 


If raw data are not available but the variances are, then the same 

result ean be obtained by 

(7.928) s= (Na — 1)8*, + (Na — Ds's + (Nc — 1)s?c + (Np — 1)s?p 
р N44 № + № + № – 4 

If the hypothesis to be tested is that (ua — ив) = (uc — ио) then 

the statistic 


Ж луу Ae Ro 
Сав -í n 


VN, Na 5 
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has “Student’s” Distribution with N, + Ns + Ne + Мь—4 de. 
grees of freedom. 
- A comparison of differences of means will be treated in Chapter 
14 under the topic of interaction. 
EXERCISE 7.4 
1. Terman and Miles in Sez and Personality ? give masculinity ratings 


for a group of gifted boys with intelligence quotients of 140 and above and 
for a control group of unselected boys, as follows: 


N x з 
Gifted boys 290 15.22 1.45 
Controls 161 14.90 1.49 


Do the data give warrant for asserting that gifted boys tend to have higher 
masculinity ratings than others? 

2. In the study by Westover * previously quoted the following data 
were obtained for the 45 students taught by the use of special practice 
exercises: 


x 8 
Initial score on vocabulary test 5596 492 т. = .75 
Final score on vocabulary test 5744 7.56 


Test the hypothesis that increase in mean score is due to chance. 

3. Long " reports a study of a comparison of the motor abilities of deaf 
and hearing children. Deaf subjects were children in residence at the 
Institution for the Improved Instruction of Deaf Mutes. These were 
paired with hearing children of the same age and sex. No subjects in 
either group had known defects other than deafness which might affect 
motor performance. Seven tests of motor discrimination were given to all 
subjects and the results analyzed separately for the boys and the girls. 

In Table 7.3 is the list of raw scores for 10 out of 37 girl pairs on 
a test of balance and on a test of grip for the average of the two hands. 
Scores of deaf children are marked D and scores of hearing children 
marked H. 


TABLE 7.3 Scores on Test of Grip and Test of Balance by 10 Deaf Girls (D) 
and 10 Hearing Girls (H) Paired on Basis of Age* 


ЕН EE AAAA E 
Pair Score оп grip Score оп balance Pair Score on grip Score on balance 
D H D H 


H D D H 


* Data from Long." 
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For each trait make the appropriate computations and test the hy- 
pothesis that the difference between deaf and hearing girls is zero. 


Alternatives to a Hypothetical Mean. Suppose that a рорй- 
lation can be assumed to be normal and to have a standard de- 
viation, с = 10, but that the mean и is unknown. Suppose the 
hypothesis is formulated that u = 50, and is tested by a sample 
of 25 observations. The critical region is constructed at a sigmif- 
icance level of .05 and the assumption that и is actually 50. Since 
a/VN is 2 and 1.96c/VN is 3.92 the critical region consists of all 
values of X less than 46.08 or greater than 53.92. 

If the hypothesis, и = 50, is true, the probability is .05 that 
Х will have a value in the critical region. But what is the proba- 
bility that X will have a value in the critical region when и has 
some value other than 50? Consider the three alternatives 
u = 48, и = 52, and и = 56. If either of these alternatives is true 
this alternative will be the actual mean of the distribution of X. 
The standard error is unchanged for these alternatives since c 
is assumed known. The shaded areas in Figure 7—4 represent the 
probabilities that X will fall in the critical region if each of the 


| 
1 
! 
1 
f 44 4j 4B 50 52 54 56 58 60 62 
1 
1 
1 


48 50 52 54 56 58 60 62 
! 
1 


КЕ 44 45 48 50 52 54 56 58 60 62 


BEP и 46 48 50 52 54 5G 58 60 62 
Fi. 7-4. Distribution of X when N = 25, с = 10, and p = 48, 50, 52, or 56. 
Under H: и = 50, critical region with а = .05 is X < 46.1 and X > 53.9 
Shaded area indicates probability that X will fall into critical region. ‘This 
bability is 
Es У A7 if p = 48 .17 if p = 52 
.05 if y = 50 .85 if u = 56 
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alternatives и = 48, и = 50, и = 52, and џи = 56 is true. These 
probabilities are respectively 17, .05, .17, and .85. 

These probabilities suggest that whenever the true value of д 
differs greatly from the hypothetical value being tested (which in 
this case is 50) the probability of rejecting the hypothesis is great 
but when the true value of и is not very different from the value 
stated in the hypothesis tested, the probability of rejection is 
small. In Figure 7-5 the horizontal axis represents values of д. 
The hypothetical value being tested is presumed to be „ = 50. 


Probability of 
rejecting H 
ay 


40 42 44 46 48 H 52 54 56 58 60 
Scale of ш 


Его. 7-5. Power function of test of hypothesis и = 50 
when N = 25, с = 10, anda = .05. 


An ordinate drawn to a given point on the baseline represents 
for that value of и the probability of rejecting the hypothesis 
и = 50. The probability that a test of significance will lead to 
rejection of a hypothesis is called the power of the test. The curve 
shown in Figure 7-5 is called a power function. As already stated, 
the power function of a test plays an important part in planning 
the conduct of a study and the analysis of results. 

Choice of Critical Region. The choicé of one-sided or two- 
sided critical region has already been discussed in Chapter 3 in 
relation to proportions. A brief discussion of this problem in 
relation to means seems desirable. Three situations will be con- 
sidered. 

A. Н: ш = из. Under this hypothesis it is necessary to guard 
oneself against both the alternatives ш > шз and ш < ш. The 

risk of being wrong can then be minimized best by using a two- 
' sided critical region, for such a region leads to high probabilities of 
rejection of the hypothesis both when ду is much greater or much 
less than us. The critical region is then ! < tha and t > hye 
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В. Н: m < ш. This hypothesis means that one is satisfied 
if m is less than m regardless what the difference may be. Тһе 
alternatives against which it is necessary to guard oneself are 
ш > ш. The critical region which leads most often to rejection 
of the hypothesis when it is false is ( > tia. 

C. Н: ш> ш. By analogous reasoning the critical region 
for this hypothesis is t < ta. 

The Number of Cases Needed to Reject a Hypothesis Con- 
cerning и When Some Alternative Is True. A problem similar in 
purpose to this one was discussed in Chapter 3 but there the 
hypothesis related to the population proportion. The decision 
as to how large a sample shall be taken is always a very funda- 
mental aspect of the plans for an investigation. If a difference 
actually exists between two population means but the samples 
taken are too small, the observed difference in the sample means 
may be nonsignificant. Then all the work of the study is wasted, 
whereas somewhat larger samples would have produced a positive 
conclusion. On the other hand, to take samples larger than neces- 
sary to establish the mean difference is a waste.of time and money 
which could be better expended otherwise. When an observed 
difference is nonsignificant, the research worker is usually in a 
quandary as to how to interpret his findings. Is the real difference 
zero, or has he used too small a sample to establish a difference 
which actually exists? He can make a far stronger statement in 
this situation if he has determined in advance how many cases he 
would need to find a difference of a predetermined size with a 
specified risk of error. Having determined the value of N in ad- 
vance by a little algebra, if he carries out the study as planned and 
obtains a nonsignificant result, he can dismiss the argument that 
his sample was too small, and can state, with a specified degree 
of confidence, that the population difference is less than the pre- 
determined value. 

To make the decision as to size of N, practical experience 
must be relied on for answers to these questions: 

1. How large a difference (d) would it be of practical impor- 
tance to find if it exists in the population? For example, a clinical 
psychologist may say “If schizophrenics differ from normals by 
so-and-so much I want to reject the hypothesis that their means 
are equal; if the difference is less than that amount I do not care 


whether we find it or not.” 
2. What estimate can be made for the population variance? 
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Unless the area of investigation is entirely new, it is often possible 
to get from previous studies a rough estimate of the variance, or 
perhaps an estimate that it lies between two specified values. 

3. How much risk can be taken of deciding that a difference 
exists when it really is zero? This decision is the choice of o. 

4. How much risk can be taken of deciding that a difference 
is zero when it really is as large as the predetermined value d? 
This is 8, the risk of error of the second kind. 

When these four values (d, o*, a, and 8) have been chosen on 
nonstatistical grounds, the statistician can determine the neces- 
sary size of N by some easy algebra. 

Suppose we wish to test the hypothesis that » = 75 with a 
two-sided test at .02 significance level. We have some prior knowl- 
edge about variability and we hazard a guess that с ~ 8. We 
decide that if the actual value of u differs from 75 by 5 points or 
more in either direction, we wish to detect the existence of a dif- 
ference d with probability .60, that is, we want £ to be no greater 
than .40 for any alternative outside the range 70 to 80. How large 
a sample will be required? 

The specified values are these: 


Ши = 75 = value under the hypothesis dı = 80 – 75 ~ 5 


ш = 80 = one alternative value d = 70 – 75 = – 5 
Ha = 70 = second alternative value а = .02, В = .40, с= 8 
В 75 С 80 


The solution is analogous to that given оп page 69 for а hy- 
pothesis about a proportion. In the adjacent sketch, the curve 
with mean pz = 75 represents the sampling distribution of X under 
the hypothesis, and the line segment BC is the region of acceptance 
for H : u = 75. This region has probability .98. Then 


C=T5+znez and В = 75 +2007. 


If р = 80 instead of 75, the curve on the right represents the actual 
sampling distribution of X and the area under this curve to the 
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left of the ordinate at C is the probability of false acceptance of 
ИН : и = 75 when и = 80. Since we have said this probability is 
not to be larger than 8 = .40, the point C must be 2407у units 
from 80, or C = 80 + 20су. As B will always be chosen to be 
less than .50, z will always be negative. If џ is actually larger 
than 80, В will be even smaller than if u = 80, and so the require- 
ments of the problem are fulfilled. 


Then C=75+2 07 and also C = 80 +240су 
Hence 75 + 2.90¥ = 80 + Zory 
ог 80 — 75 = ox(2.99 — 24) = — су(20 + 240) 
8 
= — (2.826 + .253 
5 VN ( + ) 


and N = (8(2.326 + .253))* = 17.0 


If ра should be larger than 80, the curve representing the 
actual sampling distribution will be farther to the right so 6 will 
be less than .40 and the computed value of N will be less than 17. 
Taking N = 17 will then amply satisfy the requirements of the 
problem. 

The argument for ра = 70 is similar. The curve representing 
the actual sampling distribution is to the left and not shown in 
the diagram. ^ 

'Then В = 75 +гису and B= 70 + Zorg 
and therefore 75 + Zary = 70 + 2.6007 

75 — 70 = ox(Z.00 — 20) 


8 
= —= (.258 + 2.326 
В VN ( ) 


апа М = (8(2.579))* = 17.0 as before. 


If a one-sided test had been used, the only difference would 
have been in the position of С with respect to ин. Its position 
with respect to рл would be unchanged. Then C = 75 + 2207, 


and N = (8(2.054 + .253))? = 14 


In general if d = ра — ин is the departure from hypothesis 
which it is desired to detect, N may be calculated as follows: 


2 2 
(7.30) For the two-sided test, N = (% (2. + 2 - P (гы + 20)? 


2 2 
(7.31) and for the one-sided test, N = (5 (2, + 2) +) 
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Both formulas indicate that as the discrepancy becomes larger it 
can be detected with a smaller sample and as d becomes very 
small only a large sample can cause rejection of the hypothesis 
when it is false. 

Now suppose we do not know the value of с but we have two 
or more estimates available, of which e; is the smallest and c; the 
largest. If we substitute v; for с in the formula we shall obtain 
an estimate №, of the necessary sample size. If we substitute 
o, for с we shall obtain a second estimate N, larger than N;. 
While it cannot be guaranteed that either sample size will exactly 
meet the specifications of the problem, these numbers provide 
estimates which are a vastly better guide to research planning 
than the blind guess all too often followed. 

Suppose there are two populations presumed to have the same 
variance о? and we wish to test the hypothesis шу — ua = 0, with 
а and В chosen arbitrarily as before. How many cases shall be 
taken in each sample? If the cost of obtaining a single case is 
the same for each sample, the best procedure is to take samples 
of the same size, №, = Nz = №, and to determine the number of 
cases in each by the formula 


(7.32) Ms 22 E for the one-sided test. 


The formula for the two-sided test is similar with фо substituted 
for a. 

Suppose the two populations are presumed to have unequal 
variances and these are estimated to be c, and o%. Again we 
wish to test the hypothesis ш = us. If the cost of obtaining à 
single case is the same for each sample, the best procedure is to 
make sample size proportional to the standard deviations and 
to let 


(7.33) Na паа (+2) 
(7.34) and М, = ae tes (Za + 2)? 


for the one-sided test. In the corresponding formula for the two- 
sided test, Z}, takes the place of za. 
By way of illustration, suppose the person planning a research 
study estimates о, = 5 and о» = 8 as a fair approximation for the 
standard deviations of the two populations, and chooses a = .01 
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and 8 =.20. He thinks it important to be able to detect a dif- 
ference between the means if it is 1.5 or larger. The statistician 
then estimates that the number of cases to be taken from the two 
populations should be 


25 4- 40 

№ = TS. (2.820 + .842)* = 290 
_ 40 +64 

Na = TS. (2.326 + .842)* = 464 


and verifies that 222 = 2. . 
If the person paying for the study feels that N = N, + М» = 754 
is larger than he can afford, he may revise his choice of a, of 8, or 


of d. The ratio of м will remain $ in any case. 
2 


Verify the following results if т, = 5, оз = 8 and a one-tailed 
test is used: 
If а = .05, 8 = .20 and d = 1.5, №, = 179, №, = 286 and N = 465 
If a=.01, 8 = .30 and d = 1.5, №, = 235, Ns = 375 and N = 610 
If œ = .01, 8 = .20 and d = 2, N, = 163, N: = 261 and N = 424 
Ifa=.05, 8 = .30 and d = 2, М, = 76, № = 122 and N = 198 


Standard Error of Х in Samples from a Finite Population. 
т 
VN 
parison with the population. This situation is usually true of 
experimental studies. In such studies the population which is 
sampled may not even be entirely in existence at the time of 
the experiment. Thus, the results of a study on children may be 
applied to children not yet born at the time of the study. In 
experimental studies the populations are usually regarded as 

infinitely large. 

Another type of study is one which applies to a well-defined 
population of finite size. The purpose of the investigation may be 
to determine the average score of students in a particular college 
on a scale measuring their attitude toward a current college policy. 
Suppose the college population consists of M students. If the 
scale is given to all students then the mean and variance of the 
scores can be ascertained exactly by the formulas. 

M 
У Хх, 
i=l 


(7.35) b= = 


The formula ey = is correct when the sample is small in com- 
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i (X, = Iu 
(7.36) on E 


If the purpose of the study is to measure only the attitude of 
the students actually enrolled in the college the mean in Formula - 
(7.35) provides the answer without sampling error. The standard 
error of и is zero. There may, of course, be other errors, such ag 
unreliability of the scale, but sampling error has been eliminated 
by using the entire population. 

It may be uneconomical or otherwise undesirable to apply the 
scale to all M students. A sample of N may, therefore, be selected — 
at random. An average calculated from this sample is subject 
to sampling error and a standard error formula is required. It is 
to be expected that the formula will reflect not only the size of 
the sample but also this size in relation to the size of the popula- 


tion sampled, 
If the sample mean is written 
x 
then 
са) ert WV eT 


where e is defined in Formula (7.36). Since с is not known the 
sample estimate з is used and 


cam "AAT 


An examination of Formula (7.37) shows that cy ~ 0 when 
N = M, аз is to be expected where the sample consists of the entire 
population. If N is small in comparison to M, the factor ү; re 
is close to 1 and may be disregarded. This leads to the formula for 
the standard error of a mean from an infinite population. 

If М is large, А/ — 1 may be replaced by M, and the factor 


VL = may be written with clove approximation as \/1 - Ny 
where f is the fraction of the population included in the sample 


Dy 

reduced id Baitemess of the population i tabem iato вечка" 

з. ee ange mn RAI ор 
to make the standard error $ of ite value самімі om the LI 
infinite population" 

[9 Given №, = 6, T, = 205, a = м 

М, = 5. Х, = 251, at = ы 

Compute (to test IF т ө m = 0, with пеной tent and a = 01 

в. Using Formulas (7.21) and (7.22) with a= 6+5 3 

b. Using Formula (7.25) wth a = 64862 


a sample of 12 onem, Y = 31% and à 48 
iae 35 with а = 05 wiag а tweaked test- 
إ١‎ = 35 with а = 05 wing a اانه جت‎ tent with righi tad 
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with the mean gain for a similar period published as a norm by the 
author of the test. 

b. From each of 6 litters of white rats 2 rats of the same sex and 
approximately equal weight are selected. One of these is fed diet A 
and the other diet B. At the end of the experimental period it is 
desired to test the hypothesis that the diets produce equal gains 
in weight. 

с. For two independent samples the following data are obtained: 

Nı = 325, X, = 52.3, s? = 24.2, Na = 111, Y = 54.7, s? = 23.6. 
A test of Н :0у? = сз? has been made and the hypothesis accepted. 

d. For two independent samples the following data are obtained: 

Ni = 14, X, = 263, s: = 18.5, Nz = 19, Y, = 25.4, s: = 18.8. 
A test of H : 0, = сз? has been made and the hypothesis accepted, 

e. For two independent samples the following data are obtained: 
N, = 390, X, = 63.2, s? = 10.3, Nz = 406, X. = 63.2, and s? = 38.3 - 
A test of Н :0;° = g} is made and hypothesis rejected. 

f. For two independent samples, the following data are obtained: 
Ni = 5, X = 19.2, 41 = 42, N: = 9, X; = 243, s? = 36. 
A test of the hypothesis т? = с; has been made and the ':ypothesis 
rejected, 

9. A research worker is considering the following plans for a two- 

sided test of H : u = 70: 


(1) N = 40, a = .01 (4) № = 200, а = .05 
(2) N = 100, а = .01 (5) N = 500, а = .05 
(3) N = 200, « = .02 (6) N = 500, « = .06 


a. Which one will give him the most powerful test? 

b. Which will give him the least powerful test? 

с. Which will incur the least risk of rejecting the hypothesis if it is | 
true? 

d. If he accepts the hypothesis, which will allow him to make the 
strongest statement? 

10. For each of the following requirements, ascertain how many cased 

should be included in the sample or samples. Use a one-sided test. 


1. Hypothesis: ин = 25 4. Hypothesis: ш — ш = 3 
Alternative: иа = 23 Alternative: m — ш = 6 
с = 12, а = .01, 8 = .30 0, = 5,0: = 5, а = .02, В = .20 
2. Hypothesis: ш — ш = 0 5. Hypothesis: ш — ш = 10 
Alternative: uj — m = 3 Alternative: ш — m = 5 
9$; = 15,02 = 15, a = .02,8 = .20 9; = 10,0; = 20, a = .05, 8 = 40 


3. Hypothesis: ш — ш = 0 
Alternative: ш— m = 2 
о\ = 12, o2 = 20, а = .05, 8 = .40 
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Design of Samples in Surveys. One of the most crucial aspects 
of a sample survey is the decision as to the number of cases and 
the methods by which they shall be selected. 'The answer to this 
question provides the design of the survey. This and other prob- 
blems related to surveys are discussed by Deming * and Yates.” 

Up to this point, the discussion of sampling error has assumed 
that each element in the sample is drawn at random from the 
entire population. . This method of sampling is called unrestricted 
random sampling or sometimes simple sampling. In some situa- 
tions, unrestricted random sampling is impossible. In other 
situations an alternative method may be more advantageous. 

The first requirement of any sampling procedure is the avoid- 
ance of bias. The error which causes a statistic to differ from its 
parameter may arise partly from random errors in the selection of 
individuals and partly from bias in their selection. The total of 
the random errors decreases as sample size increases but the total 
of errors due to bias does not so decrease. It forms a constant 
error which on the average is about the same for a large as for 
a small sample. 

The next requirement of a “good” sampling procedure is that 
it provide as much information about the population as can be 
obtained at a stipulated cost or that it provide a stipulated amount 
of information for as low a cost as possible. This requirement 
raises problems concerning the size of sample, the method of se- 
lecting cases, and the allocation of cases. Adequate discussion 
of these problems would require a large monograph and is out- 
side the scope of this book. Here we shall make brief mention 
of several alternatives to unrestricted random sampling, and shall 
give the standard error of the mean and related formulas for two 
of them. 

The advantages of a particular method can be more easily 
discussed if we first consider the meaning of the terms frame and 
sampling unit. The following quotation is from pages 20 and 21 
of Yates.” 


“All rigorous sampling demands a subdivision of the material to be 
sampled into units, termed sampling units, which form the basis of the 
actual sampling procedure. These units may be natural units of the 
material, such as individuals in a human population, or natural aggregates 
of such units, such as households, or they may be artificial units, such as 
rectangular areas on a map, bearing no relation to the natural subdivisions 
of the material. 


172 - The Mean or the Difference Between Two Means 


“Tt is not always necessary to make an actual subdivision of the whole 
of the material before selection of the sample, provided the selected units 
can be clearly and unambiguously defined. Thus, with sampling units 
which are rectangular areas on a map there is no need to demarcate all 
these areas; they can be defined by co-ordinates, and the selected areas 
demarcated after selection. 

“Clear and unambiguous definition demands the existence or con- 
struction of some form of frame. In the sampling of a human population, 
for instance, with households as sampling units, there must be available 
a list of all households, and this list must be such that any household 
selected from it can be unambiguously located. In area sampling from 
maps, the maps must be such that the selected areas can be unambiguously 
defined on the ground. .. . 

“Sampling units may be of the same or differing size. They may contain 
the same, or approximately the same, number of natural units, or they may 
contain widely differing numbers. The whole procedure of sampling, in- 
cluding the estimation of the population values and the sampling errors, is 
simplest when the sampling units are of approximately the same size and 
contain approximately the same number of natural units. Often, how- 
ever, the material is such that this condition cannot be conveniently ful- 
filled. In particular, if the natural units are themselves of widely differing 
size; variation in size of the sampling units or in the number of natural units 
they contain is inevitable." 


Among the more common alternatives to unrestricted random 
sampling are these: 

A, Conscious selection of “typical” cases. This unsatisfactory 
method is in widespread use. The literature is full of instances 
showing bias of one kind or another in such samples. For example, 
Yates,” (p. 12) spread out some 1200 stones on a table and asked 
12 observers to select three samples of 20 stones each which should 
represent as nearly as possible the size distribution of the whole 
collection. Of the 36 samples, 30 had means larger than the mean 
of the collection. 

B. Systematic selection from a list. A very frequent method 
of drawing a sample is to take every 10th or every 20th, or in 
general every rth name, on a list. If the first name is chosen 
at random from among the first r names, every name has the same 
opportunity of being included in the sample and that is important. 
However not all samples are equally likely, in fact some samples 
would have probability zero. Suppose the list is alphabetical and 
includes several members of the same family with the same sur- 
name. If one of these is selected the others almost certainly will 


Stratified Sampling - 173 


not be, and so observations are not independent. This method 
is often far less expensive than unrestricted random sampling and 
it is usually free of bias, unless there are periodic features in the 
list which coincide with the sampling interval. It is usually not 
possible to get a proper estimate of the standard error of a statistic 
from such a sample; the ordinary formulas applicable to unre- 
stricted random sampling may be presumed to overestimate 
sampling variability. 

C. Sampling after stratification of the population.: The popula- 
tion is subdivided into a number of groups or “strata” and a 
random sample selected from each stratum. It is most efficient 
when the strata are so defined that there is a high degree of uni- 
formity among the units within each stratum and considerable 
diversity from one stratum to another. It has two purposes: 
(1) to increase the accuracy of the population estimates and (2) to 
assure adequate sampling from any subgroup which is of par- 
ticular interest. The stratification affects the variability of any 

, computed statistics, as indicated on pages 174 and 175. 

D. Cluster sampling. Here the sampling units are themselves 
groups or clusters of natural units. Thus the sampling units might 
be households composed of individual persons; or city blocks 
composed of dwelling units; or townships composed of farms; or 
classes composed of students; or groups of telephone poles erected 
near each other, and so on. A random sample is taken of all the 
clusters composing the population and observations are made on 
all the individuals in the selected clusters. Cluster sampling is 
discussed further on pages 175-177, 

E. Multistage sampling. This term connotes a more complex 
plan with greater flexibility of pattern than the preceding. The 
sampling is carried out in successive stages, and any of the pre- 
ceding plans may be applied at any of the stages. For instance 
in the first stage a sample of clusters might be chosen at random 
from the entire population and then in the second stage a sample 
of individuals be chosen at random from the selected clusters. 
There are a great many possible variants of multistage sampling 
and each has its own set of formulas for sampling variability. 

Stratified Sampling.* This procedure is possible only when 
there is some previous information about the population. It is 
efficient only if the strata or subpopulations can be selected in such 


* For a minimum course this and the succeeding sections of this chapter may be 
omitted without impairing the clarity of subsequent chapters, 
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a way that the variance of each stratum is smaller than the vari- 
ance of the entire unstratified population. 

Stratification of a community is often on a geographical basis, 
as when the residents of a city are classified by census tract; or 
the residents of a state are classified according to whether they 
live in open country, in villages, or in cities of various specified 
size. Stratification may be by some easily identified characteristic 
of individuals which is presumed to be related to the purpose of 
the inquiry. - 

Suppose that a finite population has been divided into strata 
and that 


k = the number of strata 
М; = the number of individuals in the ith stratum 


k 
М = 3 M; = the number of individuals in the population 
m 
N; = the number of individuals in the sample taken from the 
ith stratum 
k 
N = У N: = the number of individuals in the sample 
ist : 
Xi- zx: = mean of the observations on the N; individuals 
in the sample from the ith stratum 
, D(X: X) к г д 
зё = №1 ~ Variance of Ње observations on the N; in- 
dividuals in the sample from the 7th stratum 


Then и, the mean of the population, is estimated by the formula 


(7.39) 5 — 

and и = E(X) 

The variance of X, is estimated by the formula 
M; 3 M: = N; 82 : 

40) е (н) ERES. 


Quite often the information sought is not an estimate of ће 
average и but of the total Mu. That estimate із . 


(7.41) T, = MX, = ZMX, 


and My is the expected value of T.. The variance of Т, is esti- 
mated as 
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Mi-N; sé 
Mi-1 N; 


Sometimes the information sought is an estimate of the propor- 
tion P of individuals in the population possessing some charac- 
teristic, say the proportion of homes in a rural area that have 
central heating, or the proportion of voters who will express a 
preference for a given presidential candidate. The estimate of P 
for the entire population is then 


(7.42) 8°, = DM’; 


(7.43) p" EN 


and the variance of p is 


MN M;-N; . Digi 
2 = 
8» x( у. M;-1 М; 


For large samples the various estimates described above are 
approximately normally distributed about their respective param- 
eters as mean, with the square root of the indicated variance as 
standard deviation. Therefore, confidence intervals for the param- 
eters can be obtained by use of tables of the normal distribution. 

The problem of how large a sample to take and how to appor- 
tion the cases among the strata depends upon such considerations 
as the amount of money which can be spent on the collection of 
data, the cost of locating an individual, the cost of making an 
observation on an individual, the degree of precision required in 
the estimate and the relative variability of the different strata. 
Formulas for the optimum allocation of cases to strata will not be 
presented here because the discussion necessary to safeguard their 
use requires more space than the scope of this book justifies. A 
clear presentation of these is given by David.' The reader who 
contemplates making a survey by sampling should also read 
Cochran, Deming, Hansen, King, Marks, Neyman, Stephan, and 
Yates. (See references at the end of this chapter.) 

Cluster Sampling. For this method the population is divided 
into many relatively small groups or clusters of individuals and 
the sample consists of a number of these clusters chosen at random. 

In the formulas given here it will be assumed that all the in- 
dividuals are observed in those clusters which are chosen, but this 
need not be so. It often happens that a number of clusters is 
selected at random and then out of those selected clusters, a pre- 
determined number of individuals is chosen at random. 


(7.44) 
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Cluster sampling is economical when the cost of measuring 
an individual is relatively small and the cost of reaching him 
relatively large. For example, suppose a railroad is making a 
survey of the ties on which rails rest in order to assess the need 
for replacements. To take an unrestricted random sample of all 
the ties in use, would entail heavy costs to travel to the place where 
the selected ties might be found but very little cost to examine 
the ties once they are reached. It might cost less and yield more 
information to measure 100 clusters of 5 ties each than to measure 
200 ties chosen individually at random. It would produce more 
information to measure 100 clusters of 5 ties each than to measure 
5 clusters of 100 ties each. The relation of costs to’ sampling 
procedure is outside the scope of this text, but it is well worth the 
careful investigation of any one contemplating a large inquiry 
by sample. 

Cluster sampling is most advantageous when there is great 
heterogeneity within clusters. If all the individuals in each cluster 
were exactly alike there would be no advantage in observing more 
than one of them. The larger the variation within clusters, the 
smaller the number of individuals which will be required to hold 
sy to a predetermined size. 

The formulas for cluster sampling rather than for simple sam- 
pling are usually needed in studies that utilize area sampling, as 
when small areas like city blocks or rural townships are chosen at 
random and every dwelling in those areas, or every farm, or every 
individual under consideration, is included in the sample. These 
formulas would be needed if a population of all the eighth grade 
pupils in New York City were to be sampled by selecting at ran- 
dom a number of eighth grade classes and making observations 
on all the pupils in each class. 


Let М = the number of clusters in the population 
m = the number of clusters in the sample 
n; = the number of individuals in the ith cluster 


X;- 2x: = mean of the ith cluster 


Li 


ў 
я = ee: = average number of individuals per cluster 


N = Zn; = тп = number of individuals in the sample 
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Then the sample estimate of the population mean д is 


Ў ах, 
(7.45) NUS pv ERE: 


and the variance of X, is 
M zu у п? (Х, – X) 
(7.46) КЕ emis EE (т убт 


In the special case for which nı = m =... = Nm = n, Formulas 
(7.45) and (7.46) can be simplified. Then the estimate of ш is 


УХ 
(7.47) X. = ДЕ 
т 
and its variance is 
(749) o E IO ia 


In dealing with proportions in samples obtained by cluster 
sampling it is only necessary to substitute p; for X; and P for X, 
in Formulas (7.45) to (7.48). 

The use of cluster sampling in relation to mental testing is 
discussed by Marks ™ who gives a derivation of the formulas for 
cluster sampling used above. 
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8 Inferences Concerning Variances 
and Standard Deviations of 


Normal Populations 


Not only do research workers ask questions about means.of 
populations but they ask questions about their variabilities also. 
What estimate can be made of the unknown variance of a popula- 
tion from the observed variance of a sample? Does a period of 
study make a class more variable at the end than at the beginning? 
Less variable? Are boys and girls equally variable in some par- 
ticular trait? If one sample is drawn from each of several popula- 
tions, are the observed variances of these samples consistent with 
the hypothesis that the population variances are all equal? 

Such questions are similar to the questions asked concerning 
means in Chapter 7 and similar logic is used in answering them. 
However the appropriate statistics are different and will have 
different sampling distributions. The new issues to be discussed 
in this chapter therefore will relate to the selection of a statistic, 
to the identification of the form of the sampling distribution of 
that statistic, and to the reading of the appropriate probability 
tables. The general logic of making interval estimates or testing 
hypotheses on the evidence of such statistics will follow the pat- 
terns already established. 

Sampling Distribution of the Statistic (N — 1)s?/c*. In the 
sampling experiment carried out in Chapter 6, each student com- 
puted four values of the statistic Zz?/e? or (N — 1)8/о°. (See 
line 9 of Worksheet II.) In Table 6.1, on page 134, and Figure 
6-2, on page 134, the empirical distribution of 125 such computed 
values is shown. It can be proved mathematically that the sta- 


tistic — ze a RT 2. De has a x? distribution with N — 1 degrees of 


Ns 
If a large number of students draw samples of 5 cases from the 


data of Table XXIV and compute the variance for each sample, 
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what proportion of these samples would yield a variance of 1500 

or more? Since с? is known to be 1380.4 for the population, the 
relevant statistic is 

2 _ 4(1500) 

1380.4 


From Table VIII it may be seen that 4.35 < х? з. Тһе expected 
proportion of samples with s? > 1500 is therefore greater than .25. 

Suppose a student reports that he obtained a variance of 
з? = 423 but his computations are not available for checking, 
should it be assumed that he made a mistake of some sort? His 
variance is only about a third as large as the population variance. 
Is the observed variance small enough to arouse suspicions con- 
cerning the method of sampling or the correctness of the com- 
putations? 


= 4.35 


‚ _ 4423) 
= 13804 ^ 


For 4 degrees of freedom, Table VIII shows д? = 1.1. Since 
10% of random samples of 5 cases from a population in which 
а? = 1380.4 would produce x? smaller than 1.1, more than 10% of 
such samples would show x? smaller than the observed value 1.2. 
These same samples would of course have 8° smaller than 423, 
the observed value from which x? = 1.2 was obtained. Now if 
more than 10% of samples would by chance produce a variance 
smaller than the one observed, that variance cannot be considered 
small enough to excite suspicion as to its correctness. Obviously, 
however, this absence of suspicion does not demonstrate the cor- 
rectness of the sampling and computational procedures used. 

Interval Estimate for the Variance. The argument here is 
similar in principle to the argument concerning the interval esti- 
mate for the mean developed in Chapter 7. There is probability 
.05 that (N = 1)s*/o? > x*s and probability .05 that 

(N - 1)s*/0? < у? 

Therefore there is probability .90 that (N — 1)s?/o? will be neither 


larger than x^s пог smaller than xos. This relation may be 
expressed in general as 


(8.1) Pix. « < xh ~~} 71-6 


What is wànted now is an EPER expressing limits for с? This 
inequality ean be obtained from (8.1) in two steps, or the reader 


= 1.2 


Wve 
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not interested in the algebraic derivation may merely accept 
Formula (8.2). The first step is to take the reciprocal of each 
term in (8.1) and reverse the direction of the inequality signs. (By 
way of illustration one may note that 4< 8< 10 and +> >т 


1 а 
ог 15 < $< 4. In general 3< 1 if a and b are positive and a < b.) 
This step produces the inequality 


1 в? 
= > eS SS 
Xe ie (Noe) Жкн 
Each member of this inequality is now multiplied by (N — 1)s*. 
Since (N — 1)s? is a positive number, the direction of the signs is 
not affected. The result is 
— 1)s? = 
We 1)s Baga (N z 1)s? 
X 1-e Ха 
This inequality has probability 1 — o or is made with confidence 
coefficient 1 — a. In general : 


es p{ Wade Ee pe 9 a qe 


Table VIII provides the values of x? needed for obtaining interval 
estimates made with confidence coefficients of .99, .98, .96, .95, 
.90, .80, and .50. In a practical problem these values are likely 
to meet all requirements. 

Certain of these values of x? for n = 4 are shown in Table 8.1 
with the appropriate computations for each of the four samples 
of Worksheet I. Thus for Sample 1, row 7 of Worksheet I in- 
dieates that (N — 1)s* = 7635.2. This number has been multi- 
plied in turn by each of the reciprocals shown in the third column 
of Table 8.1. 

From examination of Table 8.1 and Figure 8-1, interpretations 
can be made similar to the first four and the sixth interpretation 
regarding the confidence interval for the mean. However a change 
must be made in the fifth statement. Whereas the sample mean 
lies exactly in the middle of the confidence interval for и, the 
sample variance lies much nearer to the lower than to the upper 
end of the confidence interval for c?. 

If the confidence coefficient were zero the interval would nar- 
row to a point and that point would be at о? = $, If a confidence 
coefficient of 1.00 were demanded, implying certainty that ø? is 
in the interval, the interval would have to be infinitely wide. 
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TABLE 81 Computation of the Statistic (N—1)s*/x? for Each of the Four Samples 
in Worksheet | at Selected Values of Chi-Square. 
n=4, а? = 1380.4 


H == 2 
Symbol Tabulated рот of i Value of e 1)8/х for Sample i 
for x? value of x? ee 

value (7635.2)* (14069.2)* (3246.8)* (10696)* 
Хю 13.277 .0753 575 1059 244 805 
X5» 9.488 .1054 805 1483 342 1127 
х? 7.779 1286 982 1810 417 1376 
хв 5.989 .1670 1275 2350 542 1786 
Х.л 4.878 .2050 1565 2884 665 2193 
X5» 2.195 4556 3479 6410 1479 4873 
X5 1.649 .6064 4630 8532 1969 6486 
Хо 1.064 .9398 7176 13223 3052 10053 
хо A 1.4065 10739 19788 4567 15044 
РАТ 297 3.3670 25708 47371 10932 36013 


* The observed value of (N — 1)? for the sample. А 


1-a@=40 1-a=.60 1-«=.80 1-а=.90 


LEESSEEHE 


Ета. 8-1. Interval estimates of a? made with confidence coefficients 
of .40, .60, .80, and .90 from each of four samples of 5 cases and from 
composite sample of 20 cases. (Samples as given on Worksheet I.) 


Dotted line is for composite sample. Horizontal tic is s* of sample. 


The Chi-square Distribution - 183 
EXERCISE 8.1 


In Table 8.1 
1. Verify those values of x? which are available in Table VIII or in 
any other table you may have at hand. Values in Table 8.1 are given. 
with more significant digits than can be read from Table VIII. 
2. Verify some of the reciprocals in Column 3. 
3. Verify some of the values in the last four columns. 
4. Verify the intervals for Sample 1: 
C(575 < e? < 25708) = .98 
C(805 < e? < 10739) = .90 
C(982 < g? < 7176) = 
С(1275 < g? < 4630) = .60 
С(1565 < 0° < 3479) = .40 
5. Verify the intervals shown in Figure 8-1 from the entries in Table 8.1. 
6. Compute the intervals for the composite sample and verify them 
by comparison with the dotted lines in Figure 8-1. The following values 
of x? for n = 19 n be needed: 


оз = 30.144 х зо = 15.352 
хя = 27.204 х? = 13.716 
x?.80 = 23.900 xao = 11.651 
хло = 21.689 х2. = 10.117 


7. For samples 2, 3, and 4, obtain confidence intervals similar to those 
given in question 4 for sample 1. 

8. What proportion of all samples is expected to yield interval esti- 
mates that actually contain 0° when 1 — a = .90? Of the 4 interval 
estimates made from the 4 observed samples, how many did contain o*? 
Answer the same questions for confidence coefficients of .80, .60, and .40. 

9. If the standard deviation of a population is 15.4, what is the prob- 
ability that a sample of 18 cases will have a standard deviation at least as 
large as 20? (Hint: Square the standard deviation and work with the 
corresponding value of the variance.) 

10. If a sample of 24 cases has a standard deviation of 5.6, what 
interval limits can be set for the population standard deviation with 
confidence of .90? 


The Chi-square Distribution When the Number of Degrees 
of Freedom is Large. The largest value of л found in Table VIII 
is 30.* How are tests about the variance to be made if n > 30? 
Figure 4-5 has already suggested that as n increases the distri- 
bution of x? becomes more symmetrical. The distribution of у? 
or x approaches symmetry much more rapidly than does the dis- 

* The volume of statistical tables soon to be published by the Biometrika office under 
the editorship of E. S. Pearson and Н. О. Hartley will contain a x? table extending to 
п = 40, 50, 60, 70, 80, 90, and 100. 


184 - Variances and Standard Deviations 


tribution of x?. In fact when n > 30, the distribution of x is 
approximately normal with mean Vn —.5 and standard error 
v.5. Then " 
X = х= VR 5 xv2-v2n-1 

9x V5 1 
has approximately a unit normal distribution for large л. Chang- 
ing xV2 into V2x?, this relationship becomes 
(8.3) z2=V2x?-V2n-1 


Formula (8.3) can be used to make tests about с, and consequently 
also to make tests about o°, from a large sample. 
‘Large Sample Estimate for the Standard Deviation. Formula 


(8.3) can be used for this purpose, substituting x? ке so that 
Vox! = ;У@М —2, and n = N – 1 so that V2n—1 = V2N —3. 
Then 

(8.4) P[s.« SVN- 2—4/2N -3< а) = 1-а 


By algebraic manipulation of the expression within parentheses 
this statement may, without changing the probability, be rewritten 
as 
(8.5) )م‎ — vV2N -2 _ > Rb RE va, Mi 
V2N — 3 + 21-40 V2N – 3 zy 
The inequality within the parentheses is a confidence interval 
for c. 
2N 2 

If N ıs large enough that 2——= 2 aN 3 3 may be considered as 1 and 
z is small in relation to V2N — 3, a simpler inequality giving ap- 
proximately equal results can be used: 


(80) Pi a(1+ +e т) A a(1 к е) a 


This is precisely the result which would be arrived at on the as- 
sumption eet s is normally distributed about с with standard 


error —2— ——— , a statement which is sometimes made without 


VENT 1) 
ithe necessary qualification that it holds only for large samples. 
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Example. Given N = 308, в = 2.15, to find an interval es- 
timate for с with confidence coefficient .95. Substituting in the 
inequality of (8.6) produces the confidence interval 


1.96 1.96 
2.15 (1 1 ( 1.96 
Va MU 
or 1.98 < g < 2.32 


Ratio of the Variances of Two Independent Samples from 
Populations with the Same Variance. In Chapter 7 on more than 
one occasion a statement was made that the observed variances 
of two independent samples were not inconsistent with the hy- 
pothesis that the samples came from populations with the same 
variances, 01 = 03° = o, 

The statistic which is appropriate for testing this hypothesis 
is the ratio of the two observed variances. In general, if each of 
two independent statistics has a x? distribution and if each is 
divided by its appropriate degrees of freedom, the ratio of x:*/m 
to x:/na has a distribution called the F distribution * or the 
variance-ratio distribution. As has already been seen, (N — 1)s*/c* 
has a x? distribution. 


ied 1 1 
Nowif x= (ia pst 3 a then xı/nı = 812/91 


A 2 
and if х? = d = rm then x:/m = 8/0 


Then if с? = с? = о? 
m/m 8 


has ап F distribution. 

Tables of the F Distribution. This distribution is really а 
family of distributions. The shape changes with n; and also with 
na. In Table X values of n or the degrees of freedom hssociated 
with the numerator of the variance ratio, are given in the hori- 
zontal row across the top of the table. Values of m, or the de- 
grees of freedom associated with the denominator of the ratio, 
are given in the vertical column at the left. Each cell of the table 
represents a particular combination of m and m. For that parti- 
cular combination there exists a unique probability distribution. 


* Named in honor of В. A. Fisher to whom the distribution is due. 
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In this table only two points on that entire sampling distribution 
are shown. These аге Г. and Fs such that 


P(s?/s$ > Fa] = .05 
Pi(st/s? > Fs} = .01 


In the cell for nı = 4, n: = 4, we read Fs = 6.39 and Ё = 15.98, 

The position of Ё = 6.39 is marked on Figure 6-7 on page 141, 

but Ё is too far to the right to be shown. Now if 82/82 = 6.39, 
1 

then 8;°/82 = 6.39 = .16. 

The tables do not show P and Fo: because these can be 
readily obtained from the tabulated values with л» and n, reversed. 
Values at the lower end of the scale are crowded so close together 
that they would have to be written with a good many digits to 


afford adequate precision. It can be shown that F, = к when 
1-а 


Е, is based on n; and n; degrees of freedom and F,., is based on 
п; and n; degrees of freedom. Therefore for Figure 6-7, P = 
1/6.39 = .16 and F o = 1/15.98 = .06. 

Values of F's, Fs, and F.» as well as Fs; and F» are tabulated 
in Fisher and Yates, Statistical Tables. А still more extensive 
table computed by Merrington and Thompson will be included 
in the volume of tables soon to be published by the Biometrika 
office.* 

When the variances of two samples are being compared to test 
the hypothesis that ту? = v; = о?, it is just as exceptional for 52/85 
to be less than F.o as it is for s;/s; to be larger than Fs. The 
proper procedure then is to divide the larger variance by the 
smaller. If the ratio exceeds Fs, the hypothesis of equal variance 
may be discarded at the .10 level; if the ratio exceeds Fs, it is 
discarded at the .02 level. 

In the following chapter, a different manner of reading the 
tables will be considered, for use in a somewhat different kind of 
problem requiring a one-sided hypothesis. 

From the data on page 156 the variance of scores of 12 Barnard 
students is found as s? = 258.09 and the variance of 11 general 
studies students as вг? = 411.30. To test the hypothesis 


ey = 020 = Ф, 


we have Р = 411.30/258.09 = 1.59. In the cell of the F table for 
which n; = 10 and л» = 11, we find that Fs = 2.86. The observed 
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Е < Ёз. Therefore, if samples of this size were drawn repeatedly 
from populations with equal variance, and the variance ratio com- 
puted for every possible pair of such samples, over 10% of such 
ratios would be more exceptional than the observed ratio. The 
observations cast no reasonable doubt on the hypothesis of equal 
variance. 

Now let us consider the variances of the four samples of 5 cases 
listed in Worksheet I on page 128. These provide 12 variance 
ratios, namely 


812/822 = 1908.8/3517.3 = .54 55/5? = 811.7/1908.8 = .426 
8/8; = 1908.8/811.7 = 2.35 812/8, = 811.7/3517.8 = .23 
87/8? = 1908.8/2674 = .71 55/50 = 811.7/2674 = .30 
832/812 = 3517.3/1908.8 = 1.84 з12/812 = 2674/1908.8 = 1.40 
822/83 = 3517.3/811.7 = 4.33 s/s = 2674/3517.3 = .76 
82/82 = 3517.3/2074 = 1.82 8/8," = 2674/8117 = 3.30 


The sampling distribution of the variance ratio for n; = 4 and 
т» = 4, which are the degrees of freedom associated with these 
variance ratios, is presented in Figure 6-7. The reader should 
plot the value of each of the 12 ratios shown above on the base 
line of Figure 6-7. The largest ratio, with value 4.33 is smaller 
than the indicated value of Fəs. The value of F.o is .16 (not 
marked on the graph) and the smallest of the ratios, with value 
.23, is larger than F.os. 

If two independent samples are drawn from populations in 


which o) = о? = 0% 

then Р {Fos < 82/87 < Fos} = .90 
Tut 82/87 > Fos} = .05 
Put 82/87 < Fos} = .05 


EXERCISE 8.2 
Reading Tables of the F distribution 

1. Verify the following probability statements: 

a. Suppose a great many samples of 10 cases each are drawn from one 
population and a great many samples of 6 cases each from another, and 
suppose the two populations have the same variance. Now let each sample 
of 10 cases be paired with a sample of 6 cases in some random fashion 
and let the ratio of their variances be computed. 

Then Р {80/82 > 4:78 | = .05 See cell for which nı = 9, m = 5 
Р (s/s > 3.48 | = .05 See cell for which m = 5, m = 9 

But 1/4.78 = .299 and 1/3.48 = .287 

Therefore P [287 < 80/80 < 478 | = 90 
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and Р {.209 < s/s" < 3.48 | = .90 
Similarly P {.165 < 80/8 < 10.15 | = .98 
Р {.099 < s/s? < 6.06 } = .98 


b. А sample of 21 cases has a variance of 25.3 and a sample of 18 cases 
has a variance of 12.2, the variance ratio being Р = 25.3/12.2 = 2.07. 
If the two population variances are equal there is a probability of .05 of 
drawing samples such that 5%/з% > 2.23. Since F < F 9, the probability 
of drawing a pair of samples with variances as dissimilar as these is greater 
than .10. 

2. Fill the blanks in the following: 


Probability of 

at least as 

М Ny s т F Ем Fn exceptional 
anF 

S41 7 Bl 68-— 94 —P»40- 
. 12 60 104 36 Р < 02 


а. 
b. 
€. 15 120 17.4 30.6 
а. 25 10 43.5 9.1 
е. 210 27 33.5 15.3 


3. From the F table, read values of F for which n; = 1 and m has the 
values indicated in the column n below. Enter Р.о, and F.» on the appro- 
priate lines. 

From a table of "Student's" distribution read values of tsn and 
Low for the values of n indicated. Enter these in the appropriate lines. 

, Square the recorded values of {оъ and {з and enter on the appropriate 
lines. If you have made no mistakes Ё » = @ уъ and Го, = у» except for 
a possible difference in the last digit due to rounding error. 


Fs Lon 


= 
s 


Los 


Hi 
If 
TIE 
ШЕ 


|l 
HE 


4. From the F table, read values of F for which n; = © and n; has the 
values indicated in the column n below. Enter Fs; and F.» on the appro- 
priate lines. 

From the х? table, read x* and x? for the indicated values of n. 
Enter on the appropriate lines. 

Divide each recorded x* by the corresponding value of л and enter on 
the appropriate line. If you have made no mistakes, Fs = x*.s/n and 
F » = х? s/n except for a possible difference in the last digit due to round- 
ing error. 
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m Pas xin Xn Fn xin x'n/n 
2 — — — — — — 


8 c — — фер» موه‎ —n 
10 سے ,— لیے ر ہے کے‎ ene 
20 —— — Mq — کے‎ — — 
30 — —— — کے — ا‎ 
5. The values of F for n, = 1 and n; = oo at several significance levels 
given below are taken from Statistical Tables by Fisher and Yates, Read 
values of z at the same significance levels from a table of the normal 
distribution and enter on the appropriate lines. Enter z*. Compare with 
the F values. They should agree except for differences due to rounding 
error. 
Fe-104 Fo=271  Fu-38 Ент 6.64 Fam = 1083 
em". tae —— tom — 
Pa*—— Ome "ж m LL 


iw TEE 
B= f= 

Relation of the F Table to Tables of “Students” Distribution, 
the Chi-square Distribution and the Normal Distribution. As 
has been said before, each cell in the F table represents particular 
values of n, and m to which a unique probability distribution 
corresponds. Questions 3, 4, and 5 in Exercise 8.2 suggest that 
for a certain subset of these cells the F distribution is closely re- 
lated to the chi-square distribution; for another subset it is 
closely related to “Student's” distribution; for one particular cell 
it is closely related to the normal distribution. 

Consider the cells in the vertical column at the extreme left 
of the F table. For these cells т, = 1. Exercise 8.2 has indicated 
that for these cells Fa; = эъ and Fm = Cys, t having m degrees 
of freedom. Problems utilizing this relationship will be taken up 
in Chapter 9. 

Consider the cells in the last row of the F table, where n, = o0, 
Exercise 8.2 has indicated that for these cells Fs = x*»/n and 
F = x?/n. Here п is the degrees of freedom for х? and the degreas 
of freedom for the numerator variance in F. This interesting 
relationship is easily understood. F = s/s". When approaches 
infinity, 8: approaches о? and F approaches 8/0, But 

тву /о? = x? во 811/0? = x*/M- 
Therefore when n, approaches infinity, F approaches x*/n. 

Consider the single cell in the last row and the left-hand column. 
In Exercise 8.2, we noted that the entries for Z;-, in the proba- 
bility distribution for this cell are the squares of the entries for 
21-4 in the normal distribution, Because this cell is in the column 
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in which nı = 1, we have F = t. Because it is in the row for which 
n = © we have Fi-a = x*1-2/N = X*1-2/1 = x". Then for this 
single cell, X*ı-a = £^ ,, X° having 1 degree of freedom and ¢ having 
infinite degrees of freedom. This brings us back to a situation 
noted in Chapter 4. "There it was seen that for large values of N 
a |b 
c|d 


it made no difference whether for the double dichotomy 


bapua ман айы r as = 
(@+b)(a+ob+d(c+d) 


one 


and referred it to the x? dis- 


tribution with 1 degree of freedom or computed z = ZZ P2 — 
pU — p)N 
NN: 


and referred it to the normal distribution. In this problem 
Xh-a = 21-4а. 

Comparison of Two Variances Based on Related Scores. Sup- 
pose it is desired to compare the variance of a group at the begin- 
ning of an experiment with the variance of the same individuals 
at the end of the experiment in order to test the hypothesis that 
no genuine change in variance has occurred, that т? = o? = o, 
The ratio F = 812/8, does not provide a test for this hypothesis 
because under these circumstances sı? and s,” are based on cor- 
related scores and cannot be regarded as independent vari- 
ances. 

Let с? and ø? be the initial and final variance in the popula- 
tion and py the correlation between initial and final scores for the 
population. 

Let sı, 82° and ri; be the corresponding values for the sample. 
Then if с? = c", the statistic 


= (ss PVN — 2 
2882۷1 — 72 
has "Student's" distribution with N — 2 degrees of freedom. Ап 


equivalent form of (8.7) which may be more convenient to com- 
pute is j 


(8.7) t 


Di (Zz? — Ex’) VN -2) 

2V Er Dr? — (Xam) 

An illustration may be drawn from the Westover study pre- 
viously mentioned.” For the 45 freshmen using special practice 
exercises, the standard deviations of the Total Comprehension 
Scores at the beginning and end of the remedial period were 


(8.8) 
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sı = 4.05 and з = 11.45. The correlation between initial and final 
scores was .39. Is this a significant change in variability? 


{(11.45)? — (4.05): /43 
7 2044 05) (11.45) V1 — (.39)* 


The change in variance is so large that no reasonable investigator 
would claim it was due to chance. 

The value r = .39 is the value published by Westover as the 
correlation between initial and final scores for the entire group of 
140 subjects. No value for the correlation coefficient was quoted 
for the 45 students considered in this problem. Knowing that 
г = .39 may be in error one way or the other, we should explore as 
far as possible the consequences of such error. Suppose the cor- 
rect value of т is larger than .39. "Then V1 = т? will be even 
smaller than V1 — (.39)?, the denominator will be smaller than 
the one we have computed and ¢ will be even larger than 8.8. 
The final decision to regard the change in variance as due to 
systematie rather than chance causes would be strengthened even 
further. Now suppose the correct value is smaller than r = .39. 
Then the numerator would be larger and ¢ smaller than the one 
computed here. The extreme value in this direction would occur 
ifr were 0. In that case 


{(11.45)? — (4.05)2}/43 -81 
~~ 2(4.05) (11.45) А 


and the interpretation is not altered. If ¢ were near ts an error 
in the size of r might have serious consequences. 

The Variances of Several Samples from Populations Having 
the Same Variance. When several samples are observed, one 
often wants to know whether it is reasonable to believe that all 
their populations have the same mean ш = ш =*** = Be = и. 
That hypothesis will be explored i in Chapter 9. One also often 
wants to know whether it is reasonable to реу that all their 
populations have the same variance, c = g? =: :: = op = 0°. 
That hypothesis can be considered at this point. 

There are several different methods by which this hypothesis 
сап be tested, some of which are described in references." 
We shall-describe two of these methods: (A) a method of great 
simplicity and convenience which can be used when all samples 
have the same number of cases, but which requires a special 
probability table, and (B) a method which can be used even when 


= 8.8 


t= 
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the number of cases is not uniform from sample to sample, which 
makes use of the chi-square table, but which is arithmetically 
more laborious than the first. 

A. All samples of uniform size. To illustrate this method we 
may use the data of Table 9.2, page 200. The variances for the 
three groups of 11 individuals each are as follows: 


за? = 1188.97 83° = 2837.47 sc? = 2140.40 


We wish to test the hypothesis g4? = св? = тс? = o?. There are 
6 possible F ratios among these three variances, and the largest 
s: 88 2837.47 ^ ium 
of these ratios is ni^ 1188.97 7 2.39. If this largest ratio is not 
significantly large, none of the others is. Hartley’ calls this 
largest F ratio 
2 
(8.9) Ра = 27 


8 nin 


Table VII, gives the upper 5% and 1% points for F maz in a set of 
k mean squares all based on n degrees of freedom. In the data 
of Table 9.2, k = 3 and n = 10. In the appropriate cell of Table 
VII we find the entry 4.85, indicating that 

P{F maz > 4.85} = .05 
The critical region is F > 4.85. The observed value Fa. = 2.39 
does not fall in the critical region and so the hypothesis 

са? = ов? = тс? 

is acceptable. 

B. Not all samples of uniform size. Harris ° administered a 
test of information concerning the social factors of the community 
to teachers in four St. Louis schools, and found the means and 
variances to be as follows: 


School N; т sê x 
1. Jefferson 20 19 11.22 12.50 
2. Banneker 22 21 10.30 10.40 
3. Cole 22 21 6.20 11.91 
4. Marshall 15 14 9.79 11.94 

79 


She wanted to know whether the 79 teachers could be combined 
into a single group or whether these school differences would throw 
suspicion on the hypotheses 

Ш = ua = уз = щ =p 
And oY = 02 = оз? = 0} = 0? 
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Applying the method of section A, we note that the ratio of the 
largest variance to the smallest is 


11.22 
Fras = 99 = 1,81 


Here k = 4 and n varies from 14 to 21. Examination of Table VII 
indicates that the critical value decreases as n increases. There- 
fore if for the largest value of n the observed Fas is not in the 
critical region we shall know it would not be in the critical region 
for a smaller n. Here we find that for 


k = 4 and n = 20, the critical region is Fmas > 3.25 
and even for 
Е = 4 and n = 30, the critical region is Fmos > 2.59 


Clearly the observed value is not in the critical region and the 
hypothesis gı? = g = 03° = g} is sustained. 

If the Hartley test described above had resulted in the de- 
cision to reject the hypothesis when Table VII was entered with 
the largest n and to accept when entered with the smallest n, as 
might have happened, a different test allowing for variations in 
n would be needed. The great simplicity of the Hartley test 
makes its use advisable whenever it gives unambiguous results, 
In other situations Bartlett's! test to be described below may be 
used. 

Let 812, $*--- 812 be the variances of k independent samples 
having respectively n, ж, · · n, degrees of freedom. Then under 
the hypothesis that q? = e? = --- = g = с? the estimate of c* 
obtained by pooling the variances of the Ё samples is 


MS? me E nue) _ Uns? E 
(8.10) 8° = арр ра m where n = Zn; 
We compute the statistic 
k k 
(8.11) B= 2.3028 {n (logio 3 nis? — logio n) — У т; logio D 
i= i=l 
= B'/C ý 


hi С=1 mt Joan 
(8.12) where “lta Am a 


Even if some of the n; are quite small, say 5 or more, the statistic 
B has a chi-square distribution with k — 1 degrees of freedom. It 
is not always necessary to compute C. If B’ is not significantly 
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large, B will be even smaller because C is always larger than 1. 
Therefore C need be computed only if B' appears significantly 
large. 

The procedure known as the Bartlett test for homogeneity of 
variance is shown in Table 8.2 where it is applied to the variances 
of the same four St. Louis schools discussed at the beginning of 
this section. The reader who is not familiar with logarithms will 
need to refer to an algebra text or to reference eleven. 


TABLE 8.2 Computation to Test the Homogeneity of Variance of Teachers in 
4 St. Louis Schools on a Test of Information about the Community.* 


1 


School Ni ni nisi? s? logos? т; Іор s? - 
1. Jefferson 20 19 2132 11.22 1.0500 19.9500 .0526 
2. Banneker 22 21 216.3 10.30 1.0124 21.2604 .0476 
3. Соје 2 21 130.2 6.20 0.7924 16.6404 0476 
4. Marshall 15 14 137.1 9.79 0.9908 13.8712 0714 
Sum 79 75 6968 3.8456 71.7220 .2192 
logio Zn;s? = logic 696.8 = 2.8431 B’ = 2.3026 {75(0.9680) — 71.7220} 
logo Zn; = logo 75 = 1.8751 = 2.3026(0.8800) 
0.9680 = 2.026 


1 1 Д 
C= 14 зру (2192 - 75] = 1029 В = G = T0999 = 198 


* From Harris, Teachers! Social Knowledge and its Relation to Pupils’ Responses, 
page 22. 

The statistic B = 1.98 is referred to a chi-square table with 
k — 1 = 3 degrees of freedom, where х? о is found to be, 6.3. As 
В < x*» the evidence does not refute the hypothesis that 


с? = оз? = оз? = of = 0%. 


This conclusion is in agreement with the conclusion previously 
reached by reference to the Hartley Table. 

Cochran? has proposed a statistic which is the ratio of the 
largest of the variances in Ё independent samples each having n 
degrees of freedom to the sum of all the variances and has derived 
its sampling distribution. This statistic is meaningful and simple 
to compute, but a special table is needed for its sampling distri- 
bution. Since this book is not attempting to present all the useful 
sampling distributions but only those which are needed most often, 
Cochran’s test is mentioned here for the information of the reader, 
but without necessary tables. These tables may be found on pages 
390-391 in Techniques of Statistical Analysis.‘ 
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Bartlett’s test is useful to detect lack of homogeneity in general, 


of any sort. It is not so sensitive as Cochran’s test for detecting 
lack of homogeneity arising because the variance of one popula- 
tion is considerably larger than the variances of the others. 


The 5% and 1% significance level of the numerator of Bartlett’s 


statistic have been tabulated by Thompson and Merrington.” 
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Methods for testing hypotheses concerning means of two 
populations were considered in Chapter 7. The present chapter 
will deal with hypotheses concerning the means of several popula- 
tions. The method for testing the hypothesis щш = и: presented 
on page 156 will be shown to be a special case of the method for 
testing the hypothesis ш = ш us = *** = ць to be presented in 
the following pages. 

Use of Double Subscripts. In a great many statistical prob- 
lems, in fact in most of those which will be discussed in the re- 
mainder'of this book, observations need to be classified on two or 

` more variables simultaneously. As the classification becomes more 
complex it is helpful to have an explicit symbolism to indicate 
exactly which values contribute to & particular sum or mean or 
variance. In such situations clarity may be promoted by the use 
of double (or multiple) subscripts, as in the following rectangular 
arrangement. 


Xu Xu Xs Xu Xy Xu Хи 
Ха Xn Xn Xu Xs Xs Xn 
Xn Ха Xn Хх Xs Xs Xs 


Each entry here represents a numerical score. The symbol Xa 
is read “X two three" or “X sub two three," not “X twenty- 
three." If the arrangement represents the scores of 7 persons on 
3 tests, the entries in any one vertical column are the 3 scores 
made by one person, the scores in any one horizontal row are the 
scores of 7 persons on one test. If this arrangement represents 
3 samples of 7 individuals each, the entries in the first column 
represent the scores of the first individual drawn in each sample, 
the entries in the first row represent the scores of the 7 individuals 
of the first sample. Innumerable other interpretations might be 
made of such an arrangement. A rectangular arrangement of 
numbers is called a matriz. 

Assuming that the arrangement represents the scores of 7 per- 
sons on 3 tests, study the following parallels: 
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X» represents a score on Test 2 made by the sixth person 

Xs represents a score made by person 3 on one of the tesis, on an 
unspecified test 

X»; represents a score on Test 2 made by some person, an unspeci- 
fied person 

Ху represents a score on one of the tests made by one of the per- 
sons, neither test nor person being specified. 


Then the sum of the scores 


T: if 
in the first row is P Xy and in some row is p Xj 


in the first column-is Ў Xa and in some column is Xj 


The sum of all 21 оке is 


ў Ў Xs- $ Ух, 


{=1]=1 j=l i= 

The first letter or digit in the subscript customarily indicates the 
row; the second letter or digit indicates the column. In situations 
such as these omission of the variable subscripts and limits of 
summation begets confusion. If one wrote 2X for data from this 
matrix there would be no way of telling whether the expression 
means the sum for a row (and which row?), the sum for a column 
(and which column?), or the sum for the entire matrix. 

We also need a symbolism which will permit us to distinguish 


7 
between the mean of the 2d row which is Y X+; and the mean of 
i 


3 
the 2d column which is $Y Хә. The important aspect of this 
i 


symbolism is to be able to distinguish whether 2 is the first sub- 
script or the second. To make this point clear it is customary 
to write X with subscripts similar to the subscripts for X in the 
summation but with a dot replacing the variable subscript over 
which summation has taken place. Thus in general 


10 т 

%Ў Xa would be Y; LY X; would be Y, 
i= i=l 
a T с 

dg), Xz; would be Xr. = p X; would be X;, 


j=l 


iy $ X; would be X 


Cic jet 


If the meaning is d without it, the dot should be omitted. 
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A similar convention may be applied to the variance and the 
standard deviation. Thus 


10 
4) (Xa - X3) would be written s*; 
i=l 


40 
ds), (X7; — Xr)? would be written s*; 
ј=1 


2 1 1 2 (X; – X)? would be written s*.; 
V A 5, (X; — X)? would be written s;. 
i 
m L 1 ў (X; — Х.)* would be written s. 
i=l j=l 


Often X., s. and s,, are written merely as X, з? and s when omis- 
sion of the .. creates no confusion. 


EXERCISE 9.1 

Write out all entries for a matrix of 4 rows and 3 columns, using X 
with appropriate subscripts for each entry. 

Write the expression for each row total in the matrix just written, that 


3 
for the first row being Y Ху. 
je. 
Write the expression for each column total in the matrix, that for the 
4 
first column being Y X a. 


i=l 
- Write the expression for the total for all 12 entries. 
Write an expression for each row mean, the mean for the first row 


3 
being Xi. = 4 X Xy. 
jal 
Write an expression for each column mean, the mean for the first 
4 
column being X.ı =} Y Ха. 
=1 


Write an expression for the mean of the entire matrix. 
Write an expression for the variance of each row, the variance for the 


3 
first row being s^. = $ Y (Xy — X.) 
j=l 
Write an expression for the variance of each column, the variance for 


4 
the first column being $a =$ Y (Xa — Xa). 
iei 
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Write an expression for the variance of all 12 entries. 

Write expressions for the mean and the variance of an unspecified row, 
the mean and the variance of an unspecified column. 

Check what you have written against the symbols recorded in Table 9.1. 


TABLE 91 Symbols for Mean and Variance of a Subgroup 


i Row Row Row 
Individual Scores total rat E 
3 3 
3 2X; Z(X,-X 
Xu Хи Xu УХ; X. =a سي‎ 
j=1 
3 3 
3 2 Xy У (Xy - Xy 
Xn Xn Ха ХХ» X= г ‚= ل‎ 
jmi 
3 
А Z Xy (Ху -X, 
Xn Xe Xs 2х0 X.. "i-r a= 2 
3 3 
a 2 Xu «у - 
Xa Xa Ха Xe X- = NE — 
Column 4 4 4 
total A ER AR 
Xar Xs S 
Я 4 4 
Column $ Xa ХХ. ZXa 
mean i=l i=l [21 
4 4 4 
$a “з $a 
4 
Column È (Xa - ¥ $a. -Xa Z(Xa-Xj* 
variance | иер ^4 tel ك‎ 
3 


3 
Row total for an unspecified row "E Xa 
4 
Column total for an unspecified column =2 Xu 


a4 43 
Sum of column totals -2 2х 4 Sum of row totals TE Ёд ч 
Mean of entire group = X = 45 — 7713 = ^13 


Variance of entire group = а = vy il 
Sum of row.totals = sum of column totals. 
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Comparison of Three Means. As an illustration of a problem 
in which the means of three populations are to be compared, con- 
sider an experiment carried out by Schroeder? to study factors 
influencing performance in archery. The standard “Round” in 
archery consists of sets of 6 shots at each of three ranges, 30, 40, 
and 50 yards, shot in that order. One of the goals of Schroeder's 
study was to find out whether order of shooting the different 
ranges has an effect on performance. If the ranges are shot in 
non-standard order will the scores be comparable with scores ob- 
tained from the standard series? 


TABLE 9.2 Archery Scores at the 50-Yard Range Made by 
33 College Women.* 


oe Individual E (X/N yy 
Group More zX-T X ХХ Lm УМ 5(Х — Ху 
А 99,114, 51, 78 
134, 71, 66, 33 977 88.82 98665 867753 118897 
146, 80,105 


104, 26,136, 87 
B 187, 34, 106, 63 995 90.45 118377 900023 283747 


155, 68, 29 
41, 83, 61, 189 
с 88, 80, 141,112 1221 111.0 156935 1355310 21404.0 
141, 112, 173 
Sum 3193 373977 3123086 616684 
Entire group 3193 96.76 373977 308946.9 650301 


* Data from Schroeder.’ 


A sample of 33 subjects was chosen and divided randomly 
into 3 equal subsamples. Each group of 11 subjects was given 
6 lessons in archery, the order of shooting the ranges being changed 
from lesson to lesson. (Note that there are 6 possible orders, be- 
cause the number of arrangements of 3 things is 6.) Now con- 
sider the performance at the 50-yard range when this range was 
shot first by Group A, second by B, and third by C. Table 9.2 
presents the resulting scores, each score there being the sum of 
the performance scores of one individual in two lessons. The 
means of the three groups seem to suggest that the range shot last 
has the advantage. However before concluding that. the advan- 
tage is real, existing in the population of all possible observations, 
it is necessary to determine that the variation among the three 

‘observed means is too great to be attributed to the chance effects 
of sampling. 
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Note that if the study had been set up to compare scores made 
on the range shot last with scores on either of the other ranges, 
it would be satisfactory to combine the scores of Group A and B 
into one group of 22 cases, which for convenience we may call D, 
and to test the hypothesis that uc = un by the methods of Chapter 
7. However the experiment was not set up to test that hypothesis 
but to test the hypothesis ша = ив = ис. When observation of the 
data suggests a new hypothesis, that new hypothesis can never be 
satisfactorily tested on the data from which it was obtained but re- 
quires a new set of observations, 

Mathematical Model. To specify the mathematical popula- 
tion (that is to provide a mathematical model), we shall assume 
that each of the three samples is drawn at random from a normal 
population and that all three populations have the same variance 
but not necessarily the same mean. 


Population Mean Variance 
A Ha e 

Ma a 

He et 


For precision, the assumption that all three samples were 
drawn from normal populations with the same variance needs to 
be investigated. Application of the tests of normality described 
in Chapter 5 indicates that the assumption of normality may be 
accepted. (Usually it is satisfactory to look at the grouped fre- 
quency distribution of scores and to decide more or less intuitively 
that the distribution appears fairly normal.) The assumption of 
equality of variance can be tested by comparing the three sample 
variances by the method described in Chapter 8, This test in- 
dicates that the assumption of equality of variance may also be 
accepted. 

Hypothesis to be Tested. The hypothesis to be tested is 

Ha Be ™ Uc И 
If this-hypothesis is true and the assumption of normal popula 
tions with equal variance is justified, then the 33 scores may be 
regarded as random observations from a single normal population 
with mean и and variance gê. The means of the three samples 


would then be random observations from a normal population of 
с c р 
sample means for which E(X) = u and oy’ = у" 11'% explained 


in Chapter 7. 
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The unknown variance с? can be estimated from the variation 
among the three sample means. It can also be estimated from 
the variation among the archers within groups. The ratio of these 
two estimates of с? provides the statistic by means of which the 
hypothesis can be tested. 

mision of c? from Variation Among Means. The value of 
Z ; ean be estimated by com- 
puting dis variance of the means shown in Table 9.1 around the 
mean of the entire group. These three means may be treated as 
a sample of three individuals, so that their variance has 3 — 1 = 2 
degrees of freedom. Then 


ox’ = e which in this problem is 


3 
w- Kı FB? + (Fs = F)*+ (Х,- Xy ا‎ 
2 2 


is an unbiased estimate of су? = Га and therefore 


3 
uq, - Х) ny(X,;-Xy 
= cL аар = етае 
11s 3 3 
is an unbiased estimate of о? if the population means are equal. 
In general if there are k samples of N cases each, 


k k 
NY(X,-X) NY(X;-Xy 
п у ЕЕН С et a 2L 
9.1) Nsx ЖЕШЕТ] Ec 
is the estimate of c? obtained from variation among the means. It 
is usually called the “mean square between group means," or the 
“mean square for variation between means” or the “mean square 
between groups” or “mean square for means." 
The numerator of Formula (9.1) is the sum of squares between 
groups. The denominator is the number of degrees of freedom. 
Applying Formula (9.1) to the data of Table 9.2 we obtain 


11у? = 42 [(88.82 — 96.76)? + (90.45 — 96.76)? + (111.0 — 96.76)*] 
= 2262 = 1681. 


Computing routines which are algebraically equivalent to that 
of Formula (9.1) but which demand less arithmetic labor and pro- 
duce smaller rounding errors are furnished by Formulas (9.2) 
and (9.3). 
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(9.2) Noy? = == 


1 1 
1 N убху ¬ р Xy 
k-1 
kY Xy - zx 
ij 
Nk(k — 1) 
Applying these formulas to the data of Table 9.2 we have 
(977° + 995° + 1221?) (3193) 
by (9.2) Neg? = 4 = 1680.8 
and by (9.3) 


Nsy? = 


(9.3) Му? = 


3(977° + 995° + 1221?) — 3193? 
11(8) (2) 

Estimate of c? from Variation within Groups. Formula (8.10) 
was stated on page 193 as providing an estimate of о? based upon 
the pooled variance of several groups. In the present problem 
the estimate of c? thus obtained would be 

(№, — 1)s? + (№ – 1) + (Ns = 1)8 x 10s, + 108,° + 105; 

Ni +Ns+ Ns =3 11+11 +11 -3 
This is usually called the “mean square within groups.” In general 
if there are k groups of N cases each, о? is «шй аз 


= 1680.8 


2 2 2 ў ` (Xu - Xy 
(9.4) s. Dore co. CT ана 


The numerator of (9.4) is the sum of squares within groups. The 
denominator is the number of degrees of freedom. 

Computing routines which are equivalent to that of Formula 
(9.4) but which involve less arithmetic and less rounding error are 
provided by Formulas o. 5) and (9.6). 


3ir-lidxs 
(9.5) 8? - 25 ст —— 


Eye OX 


= (N = 1)Е 
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NYY X- YX 
j i EY] 
(9.6) E TN DE 
Applying Formula (9.4) to the data of Table 9.2 we have 
11889.7 + 28374.7 + 21404.0 61668.4 


8% = жойылу e 709 4) = 2055.6 
This computation is simple only because the numerator had al- 
ready been worked out in Table 9.2. Applying Formulas (9.5) 
and (9.6) we have 


373977 — d (977* + 995? + 12217) 


8° = (10708 = 2055.6 
11(373977) — (977° + 995? + 12212) _ 
and 8% = eo ТТУ Зу TT | 2055.6 


The Variance Ratio. We have found two estimates of о, 
namely 1681 and 2056. Estimates computed in this way have the 
property that they are independently distributed. See page 208 
for further discussion of this property. A third estimate of о? 
may be obtained from the deyiations of all 33 scores around the 
grand mean X but this estimate is not distributed independently 
of either of the other estimates and therefore cannot be used in 
the F ratio about to be described. 

In Chapter 8 it was said that any ratio of the form (N 1)8%/0° 
or ns*/c? has a chi-square distribution with n = N — 1 degrees of 
freedom. It was also said in that chapter that if $,?/s;? are two 
independent estimates of the same variance о?, the ratio 812/82 
has an F distribution with n; degrees of freedom for the numerator 
and л» for the denominator. Now since 


Bar may be called xı? and "28 ?' called x2", 


a ; may be called & P and S A Ке x i 
2 


Then s i x! т 


, 
$2" хї/т х? т Y 


and any expression which can be reduced to this form has an F 
distribution if x;?/n; and x;*/n; come from independent estimates of 
the same variance. Therefore in the problem under consideration 
the ratio of the mean square between group means to the mean 
square within groups is distributed as F with 2 degrees of freedom 
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for the numerator and 30 for the denominator, or in general with 
Е — 1 degrees of freedom for the numerator and Nk — k for the 
denominator; 


p . mean square between group means 


(9.7) mean square within groups 
. Sum of squares between groups Nk – k 
sum of squares within groups А-1 
(0.8) or .NZ((;-Xy Nk-k 


WEDS Жум. Em 


Analysis of the Archery Data. The computations already made 
from the data of Table 9.2 may now be summarized in Table 
9.3, which is in the general form customarily used for such prob- 
lems. Usually the value read from an F table with which the 
observed F is to be compared would also be included in the table, 
but as that has not yet been discussed it will not be shown here. 
Interpretation of the value Ё = .82 depends upon the selection of 
a critical region and will be discussed in the following section. 


TABLE 9.3 Analysis of Variance of 33 Archery Scores 


Sum of Degrees of Mean Р 


Source of variation squares freedom вдйаге 


Total group 65030 32 

Between group means 3362 2 1681 82 
Among archers within groups 61668 30 . 2056 
س ت‎ ÓÀ 


Critical Region. From Table X in the Appendix, a critical 
region with either a = .05 ого = .01 can be obtained. For testing 
the hypothesis under consideration, the variance ratio has two 
degrees of freedom for the numerator mean square (n; = 2) and 
30 for the denominator (n; = 30). The corresponding cell of the 
table contains. the entries 3.32 and 5.39. From these it is to be 
understood that 

P(F > 3.32 | m = 2, т = 30) = .05 
P(F > 5.39 | m = 2, т = 30) = .01 


or P(F < 3.32 | m = 2, т = 30) = .95 
Р(Е < 5.39 | т = 2, т = 30) = .99 
or Fs = 3.32 and Ё = 5.39 


Thus for а = .05 the region of acceptance is F < 3.32 and the region 
of rejection is F = 3.32; while for a = .01 the region of acceptance 
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is F < 5.39 and the region of rejection is F = 5.39. It may be 
assumed that values precisely 5.39 and 3.32 will never occur, but 
values equal to these within a given margin of rounding error may 
appear. The research worker should make up his mind before he 
sees the results of his computations whether he will consider that 
such shall be considered to fall in the region of acceptance or of 
rejection. If he decides to place them in the region of rejection, 
he would define such as F z 5.39 for a = .01, and similarly for 
а = .05. 

The observed value Р = .82 falls well within the region of ac- 
ceptance and one must conclude that the data are consistent with 
the hypothesis that scores obtained at the 50-yard range are not 
influenced by the order in which that range is shot within the round. 

In the present test the critical region has been placed in the 
upper tail of the distribution while for the problems considered in 
Chapter 8 it was in both tails. The difference in the lcgic of the 
two situations needs to be examined. 

In Chapter 8 the values of sı and s, used in the variance ratio 
were obtained from two independent samples which by hypo- 
thesis came from populations having the same variance. Too 
great a discrepancy between s;* and s: is damaging to that hypo- 
thesis regardless of which variance is larger. "Therefore the critical 
region must inclide both tails of the distribution so that the 
hypothesis will be rejected if 


either $'/s is very small or 5//s?is very large 
and this is the same as if 
s/s? is very large or s;/s?is very small 


The procedure is to place the larger variance in the numerator and 
to consider the probabilities associated with the two-sided critical 
region to be not о but 2a. Significance levels read from Table X 
for such problems will be .02 and .10. The study of Figures 9-2 
and 9-3 on pages 208 and 209 will further clarify this point. 

In the problem under consideration in the present chapter, 
the two variances are obtained from the same sample. The var- 
iance within groups estimated by Formulas (9.4), (9.5), or (9.6), 
is an unbiased estimate of о? whether the hypothesis tested is true 
or not. If the hypothesis that ш = us =--- = u, = p is true, the 
variance between groups is also an unbiased estimate of o. How- 
ever if the groups have different means in the population, the 
mean square between groups will have an expectation larger than o°. 
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Form of the F Distribution. As this curve has a different 
form for every possible pair of numbers n, and л», the attempt to 
visualize it is not very rewarding. It is always skewed with range 
from 0 to infinity. Negative values of F cannot occur because both 
numerator and denominator are necessarily positive. The highest 


point on the F curve occurs at F = тайы) but that fact.is not 
т(т — 2) 
very important. Its mean is at F = xis 3 which is slightly above 


1.0. Falsity of the hypothesis ш = ш = +++ = u, cannot possibly 
make the mean square between means smaller than o* and must 
make it greater. Therefore in problems of this type the mean 
square between groups is always placed in the numerator and the 
mean square within groups in the denominator regardless of 
which is the larger. Then very large values of F throw doubt on 
ihe hypothesis but very small ones do not. As large values of F 
occur in the upper or right-hand tail of the sampling distribution 
of F, the critical region is in the right tail only. Significance levels 
obtained from Table X will be .01 and .05. 


0 1.0 2.0 3.0 4.0 
Scale of F 


Fic. 9-1. Sampling distribution of F when n, = 3 
and m = 16 with empirical distribution of the 34 
samples values. 


One may ask what can produce a very small F, since failure 
of the hypothesis cannot make it small. When a suspiciously 
small F occurs one asks if computations are correct, if the cases 
were selected at random, if the data were in any sense artificial 
or improperly gathered. For example, instructors have been 
known to make up examples from artificial data yielding F's that 
are improbably small. If there appears to be nothing wrong with 
the data ог the computation, one merely ascribes the situation 
to chance. When F < 1, it is customary to say at once that the 
hypothesis is accepted without referring to probability tables. 
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However, if F is very small one may suspect that it represents a 
non-chance situation and may wish to make a test of significance. 
The procedure then is to take the reciprocal of Ё and to compare it 
with the tabular entry for which n; is the degrees of freedom for 
the numerator and 7; for the denominator. For example for the 
Schroeder data F = .82, so 1/F = 1.22. In the cell for which 
т = 30 and m = 2 we find Fs; = 19.46, and F.» = 99.47. If, 
therefore, we had the complete distribution for the variance ratio 
with n; = 2 and m = 30, it would show us that 


Fo = 1/99.47 = .01005, Fos = 1/19.46 = .0514, 
F.q = 3.32 Fa = 5.39 


while the complete distribution for the variance ratio with n, = 30 
and л» = 2 would show 

Fa = к LY .19, Fos ма 15 = 301, Fas = 19.46, Po» - 99.47 
It may interest the reader to see pictures of the curve for selected 
values of n; and лз. Figure 6-6 on page 139 shows the form of 
the curve for nı = 4 and л» = 4. Figure 9-1 shows the form for 
т = 3 and л» = 16 and also the empirical distribution of 34 sample 
values of F obtained by students in a statistics class. These are 
the values of the fraction A/B obtained on row 14 of Worksheet 


Oa b 1 B2 A3 4 5 6 
Value of F 


Fic. 9-2. Sampling distribution of Е when m; = 4 and m = 20. 


5% of area lies above F = 2.87, at A 

5% of area lies below F = .17, ata Mode = .45 
`20% of area lies above F = 1.65, at В Mean =1.11 
20% of area lies below F = .41, atb Median = .9 
57% of area lies below F = 1. 
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III in Chapter 6. 

Figure 9-2 shows the curve for nı = 4 and m = 20 while Fig- 
ure 9-3 shows the curve for m= 20, and m=4. These two 
curves may now be studied together. If 9-2 represents the dis- 
tribution of F = s;?/s? then 9-3 represents the distribution of 
F' = 1/F = 80/82. Either curve alone shows the probability for 
every possible value of the ratio between the two variances. How- 
ever ratios smaller than 1 would be closely crowded on the base 
line between 0 and 1 and would have to be given in the table 
with a large number of decimal places in order to distinguish 
probability levels. The segment of the horizontal axis in Figure 
9-2 to the right of the point 1 represents the same situations as 
the segment of the horizontal axis of 9-3 to the left of 1 and vice 
versa. Together the two segments to the right of 1 represent all 
possible situations. Therefore it is satisfactory to tabulate only 
the right-hand segment of each curve, and this is the reason that 
Table X contains no entries smaller than 1. 

Independence of the Components of the Sum of Squares. The 
two components into which the sum of squares has been analyzed 
have the surprising property of being independently distributed. 
This property of independence holds in spite of the fact that the 
two components are based on the same measures. As these two 


0. al bh 3 2 WIB'L^:3 4 5 A'6 
Value WF 
Fia. 9-3. Sampling distribution of Р when m = 20 and m = 4. 


5% of area lies above F = 5.80, at A’ 
5% of area lies below F*— .35, at a’ Mode = .6 


20% of area lies above Ё = 2.45, at B’ Mean = 2.0 
20% of area lies below F = .61, at b^ Median = 11 


` 43% of area lies below F = 1. 
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components of the sum of squares are proportional to the two es- 
timates of the variance, those estimates also are independently dis- 
tributed. The word "independent" implies that if many samples 
of the same size from the same population are examined, the value 
of one of these components in a sample is in no way predictive 
of the value of the other. By way of illustration, there are avail- 
able computations made by 46 students of the variance between 
means of 4 samples of 5 cases and the variance among individuals 
in these same samples, 

These two components have been distributed in a scatter dia- 
gram in Figure 9-4. The correlation is r = — .05, a value which is not 
significantly different from zero by the test described in Chapter 10. 

Unless the two components of variance are independent, the 
ratio of the related variances does not have an F distribution. 
The algebraic relationship of Formula (9.9) holds for all samples 
regardless of the population from which they were obtained but 
this is a relationship within a sample and does not establish inde- 
pendence from sample to sample. 


Value of ZZ(X;; — X) 


HEHE 


= 


Value of 5Z(X; — X} 


Fig. 9-4. Joint frequency distribution of the sum of 
squares among means 5X(X; — X)' and the sum of 
squares within groups ZX(X;; — X,) from 46 samples 
of 5 cases. т = — .05. 


Algebraic Relations in Analysis of Variance. While the com- 
puting procedure described in the preceding pages can be carried 
out when numbers in the subsamples are equal, slight modifica- 
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tions are necessary if those numbers are unequal. Moreover 
certain algebraic relations among the sums of squares can be used 
to simplify the computations considerably. Those relations will 
be described in the following paragraphs. ^ 

The double subseript notation described at the beginning of 
this chapter is partieularly useful for analysis of variance. Let 
X jj represent the ith observation in group j, or subsample j. Be- 
cause the formulas look rather long and imposing, though they 
are not essentially difficult, we shall introduce the symbol T as 
sum or total of a set of scores. 


№ 
¥Xa= Т, and X; = T/N: 


i=l 


Ni 
Ki T; and X; - Ту; 


k 
YN;=N 
jel 
k Ni 
Y УХ»=Т and X - T/N 
ў=1{=1 
Then for any one subsample, say the jth, the sum of the squares 
of the deviations of individual observations from the subsample 
mean is 


Ni x 5 Nj ; T? 
Ха - Ry = YX FF 


For brevity this expression will be called the sum of squares. 
The total sum of squares can be partitioned as follows: 


k Ni k Ni k 
OM $ У(х„-Ху =, S Au- X+ XN- Xy 


Equation (9.9) is very important, is in fact the basic relationship 
underlying the process of “analysis of variance” described in this 
chapter. Тһе total variation described by the sum of squares of 
all N cases around the total mean has been broken up into two 
components each of which is expressive of a type of variation. 
Thus it is the sum of squares rather than the variance which is 
analyzed. The sum of squares can be analyzed into meaningful 
components in a number of different ways, some to be described 
here and some in subsequent chapters. In Formula (9.9) the 


kN * а а ты 
component Y ))(X,;— Xj)? represents variation among individ- 

dz H . ft . 
uals within groups. It would be zero only if all individuals in 
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n 
each group had the same score. The component Y N;(X;,- X) 
i 


represents variation between group means. It would be zero only 
if all samples had the same mean. 
Some additional algebraic relationships are useful to simplify 
calculations. 
A. When the number of cases is the same for all groups, so that 
N, = Nz =-+-: = № = М, Formulas (9.10), (9.11), and (9.12) 
may be used for the sums of squares: 


k M k М T: 
(9.10) Total: 2 PS -Xy- 2 2 X'y- Hn 


k k 
т? 
(9.11) Between * means: M Y, - X; = и? - P 


k M 
(9.12) Within groups: È EG. -Xp- 5 xs - Y ў T? 


B. When the number of cases differs from group to group, For- 
mulas (9.13), (9.14), and (9.15) will be needed for the sums of 
squares. 


(9.13) Total: 23 Ža- ¥= vues 


=й 


(9.14) Between means: ўм, (X;-Xy- 
j=l 


ae k Ni k Nj k T} 
(9.15) Within groups: X POS = Xj) - X yx 3» p N; 


Computation Procedures Applicable when Subgroups are not 
of Uniform Size. Data from a study by Harrington‘ provide an 
illustration. He made a study of the recommendations written 
for a selected group of applicants for teaching positions in different 
high school subjects, to determine what characteristics of these 
recommendations, if any, are associated with success in obtaining a 
position. The number of recommendations varied from candidate 
to candidate. “Successful” candidates were those who were 

* Whenever the authors can remember to do so, they are using the phrase “between 
groups" or "between means" whether there are two groups or more than two. This 
flouting of usual grammatical convention regarding “between” and “among” is deliber- 
ate. Experience indicates that use of the phrase “between means” when there are two 
samples and “among means” when there are more than two introduces a verbal com- 


plexity which causes some students to confuse the latter phrase with the phrase "among 
individuals within groups." 
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elected to the position for which they made application. “Un- 
successful” candidates were those not elected to these same 
positions, although they might subsequently have been elected 
to other positions. In all, 59 independent similar positions were 
available, providing 59 successful and 134 unsuccessful candidates, 
for whom there were 1144 recommendations. 

The recommendations were scored on a large number of traits. 
The data presented here relate to length of recommendation, a 
score being the “number of items commented on not unfavorably” 
in the recommendation. Table 9.4 shows the basic data for the 
6 successful candidates and Table 9.6 for the 14 unsuccessful can- 
didates, selected from the entire group to illustrate procedures. 


TABLE 9.4 Scores on Length of Recommendations Written 
for 6 Successful Candidates for Teaching Positions.* 


——————————————O 


Candidate Recommendation N zx zx 
Scores 

1 9 12 9 19 6 5 55 708 

2 1012 7 12 6 5 47 418 

3 4—16 -Б = 8 5 28 166 

4 4 51) 9 14 10 10 7 63 639 

8 17 8 19 4 52 778 

6 5 6 13 3 4 27 239 

Tora. 80 272 2998 
T? _ (272)? 

uctor m 2466.13 


Total sum of squares = 2998 — 2466.13 = 531.87 
Sum of squares between means = (55)2/5 + (47)2/5 + (28)2/5 + (63)#/7 


+ (52)//4 + (27/4 — 17 = 2028.85 — 2406.13 = 162.72 
Sum of squares within groups = 531.87 — 162.72 = 369.15 
* Data from Harrington.* 


TABLE 9.5 Analysis of Variance of Scores on Length of Recommendations for 
6 Successful Candidates for Teaching Positions 


Source of Sum of ^ Degrees of Mean F 
F E 
variation squares freedom square 
Total 531.87 29 
Between means 162.72 5 32.54 2.12 2.62 
Within groups 369.15 24 15.38 


Since the F obtained in Table 9.5 is less than Р.а the hypothesis 
that the means of recommendation scores do not differ among the 
6 successful candidates may be accepted. 
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Recommendation scores for 14 unsuccessful candidates are 
shown in Table 9.6. The student should complete the analysis 
and should verify that F = 1.31 and that Р. = 1.9. Hence the 
hypothesis that the mean length of recommendations does not dif- 
fer among unsuccessful candidates may be accepted. 


TABLE 9.6 Scores on Length of Recommendations Written for 14 Unsuccessful 
Candidates for Teaching Positions 


Candidate Recommendation scores N ХХ ZX: 
1 12 5 20 11 2 
2 7 ll 14 
3 9,3 6 6020 4 8606 
4 29110. 72. 8.13. 6 
5 8 77 17 
6 > et SLE T 
T ИЛЕ 72219 06.11, 0.0 
8 415 4 15 8 
9 Brin Cael eT eB 84 -6 
10 6 9 10 4 
11 E 4176728 "9. 16 
12 24 20 9 9 
13 Doi ы т, Ти A |! 
14 4 5232 7 
Total 77 723 9009 


The information just obtained about successful and unsuccess- 
ful candidates may be used to advantage in pursuing one of the 
purposes of Harrington’s study, namely to determine whether 
successful candidates differ from unsuccessful in mean length of 
recommendation. Since we accept the hypothesis that the means 
of candidates within the groups are equal all recommendations of 
each group can be regarded as coming from a single population. 
Hence the problem of testing the hypothesis that the mean recom- 
mendation length of successful candidates is equal to the mean 
recommendation length of unsuccessful may be reduced to a prob- 
lem in analysis of variance with two populations. The data may 
now be treated as consisting of two subsamples, one of 30 and one 
of 77 recommendations. The calculation follows: 


T? (272 +723)? (995) 
Na Nui ЕТ а зин 


Total sum of squares = (2998 + 9009) — 9252.57 = 2754.43 
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Sum of squares between means 
= (272)?/30 + (723)?/77 — 9252.57 = 2.25 
Sum of squares within groups = 2754.43 — 2.25 = 2752.18 


As we are now testing the hypothesis that the means of two 
populations are equal, the / test, for comparing the means of two 
independent samples might have been used. The value of t 
obtained by Formula (7.23), page 156, would be V.086 = .29 since 
t = VF when F has 1 degree of freedom in the numerator. 

The calculation of F is usually simpler than that for obtaining 
1 directly. However, the F table shows values for F's; and Fa 
only while the table of ¢ gives a greater range of values. It is a 
satisfactory procedure when comparing the means of two inde- 
pendent samples, to compute F, take its square root, and refer 
the result to the table of Student's" distribution. 


TABLE 9.7 Analysis of Variance for Comparing Length of Recommendations 
of 6 Successful and 14 Unsuccessful Candidates 


——L € o AE r = 
Source of Sum of Degrees of Mean FF 
variation squares freedom square - 

Total 2754.43 106 

Between means 2.25 1 225 .086 3.94 

Within groups 2752.18 105 26.21 


Маш. shinies ШО НН н eS ee ee ay 

The observed value of F = .086 is so small that one is justified 
in questioning whether the selection of candidates was really 
random. As a matter of fact it was not. Harrington supplied 
the figures to illustrate his procedures and at that time there was 
no intention to use them for any purpose other than to examine 
his methodology. 

It is of interest, however, that even for his entire group Har- 
rington did not find mean length of recommendation significantly 
different for persons who secured positions and for their unsuccess- 
ful competitors. By a process too long to be described here, he 
obtained a quality score for each recommendation. These quality 
scores differentiated the candidates whereas length had not done 
so. For variation between 183 candidates the mean square was 
76.48/182 = .420. For variation of 1035 recommendation scores 
around the candidate means the mean square was 

: 208.70/(1035 — 183) = .245. 
Therefore F = .420/.245 = 1.71. With 182 degrees of freedom for 
the numerator and 852 for the denominator, Ёз» = 1.28. The 
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candidates cannot be treated as having the same means of quality 
Scores. 

Comparison of Means of Several Measures of the Same 
Individuals. Let us return now to the problem of the effect of 
varying the order with which ranges of different length are used 
in lessons of archery. When this problem was discussed at the 
beginning of this chapter three independent groups were used, each 
group shooting at the 50-yard range in a different order from the 
other two. An objection to that form of experimental design is 
that the sample differences were based not only on order of shoot- 
ing, but also on differences in skill among the groups, so that 
difference between orders is confounded with difference between 
groups. 

One way of eliminating group differences from the design is to 
compare the scores on the same group of subjects. For simplicity 
in illustration we will present scores for 11 subjects even though 
data are available for all 33 subjects. For all 11 subjects scores 
made at the 50-yard range when this range was shot first, second 
and third are presented in Table 9.8. 


TABLE 9.8 Scores of 11 Archery Students at the 50-yard Range for Varying 
Order of Shooting* 


First Second Third 


Subject: order order order Bum sie 
1 114 182 123 419 139.67 
2 99 121 119 339 113 
3 51 100 35 186 62 
4 78 144 72 294 98 
5 134 125 201 460 153.33 
6 71 38 49 158 52.07 
7 66 67 107 240 80 
8 33 89 51 173 57.07 
9 146 159 157 462 154 
10 80 101 83 264 88 
11 105 113 113 331 110.33 
Sum 977 1239 1110 3326 
Mean 88.82 112.64 100.91 100.79 
paeen ta CSO A Se Ee t hn HR UR e 


* Data from Schroeder. 


It is important to notice the difference between the scores in 
Table 9.1 and those in Table 9.8. In the former the scores in 
any column can be rearranged in any arbitrary manner without 
changing the meaning of the data. In the latter any interchange 
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of scores in & column must be accompanied by a corresponding 
interchange in every other column since the three scores in any 
row are linked by the fact that they belong to a given person. 
This difference plays an important part in determining the sta- 
tiscal model which is to be used in the analysis of the data. 

We shall suppose that each observation in Table 9.8 is derived 
from a normal population with a mean which depends both on 
the order of shooting the range and the person doing the shooting. 
Consider all the possible scores which might be made by the ith 
person shooting this range in the jth order to be a population and 
let the mean of that population be called ш. The single score 
recorded for that person at that order is a random deviate from 
pi. Thus each cell in Table 9.8 has its own population mean and 
each of the 33 scores recorded there is a random deviate from 
the population mean of the cell in which it stands. As both order 
and person may influence the mean, it is conceivable that the 
33 observations in the table come from 33 different means. These 
33 possible population means, together with certain row and 
column averages are displayed in Table 9.9. In this table 


ш. = dun + ш + шз) 
and in general иу. = 3(ua + wa + pa). Similarly 


$ 11 
иа = (и + Ша + pa btt Ип, ) and и; = Pr ин» 


The total mean и is a mean of the 33-cell means, also of the 11-row 
means, and also of the 3-column means. 


1 3 1 3 
и = Eri » Ўш m жуш. = Xu; 
i=lj=l i=l j=l 
or in general, if there are r rows, and с columns 
1 1 T 
(9.16) п uia 2:2 E 


The row and column means are-expressive of the physical 
aspeets of the problem. The mean иу. is the skill of the ith per- 
son regardless of order of shooting. The mean 11.5 is the mean of jth 
order regardless of subject. The mean и is the average of all 
effects. The difference u;. — и may be regarded as the person 
component or the deviation of the skill of the ith person from 
the skill of all 11 persons. Similarly the difference и.; — и 18 the 


order component. 
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TABLE 9.9 Population Means of Scores in Table 9.8 
——————————————— 


, Order Mean of 
Subject First Second Third row means 
1 Bu Hie Шз n. 

2 Ba Hn pa ш, 
3 Из из Hs, 
4 Ша Ba Шз щ. 
5 Ba ГД Bsa Ms. 
6 Het пе Hes Hs. 
7 m m Ha Br. 
8 Hs Hee Ив Hs, 
9 ра Г m ۳ 
10 Шо, 1 Шо, 2 Шо, 3 Шо, 
11 Mu, 1 Hn, 2 Bu, з Bu 
Mean of 
Ba Ka Из и 


column means E 
—————————————É 

To obtain information about the 33 unknown parameters of 
Table 9.9 from the 33 observations of Table 9.8 is a hopeless task _ 
unless the statistical model can be simplified in some way so that 
the number of parameters is smaller than the number of observa- - 
tions. We must make some assumption that will reduce the num- - 
ber of unknown parameters. An assumption useful for this purpose 
is that each cell mean и, is the sum of the general mean, и, the — 
component for the ith person, and the component for the jth 
order, that is 
(9.1) kij = + (mi. — u) + (и; y) = ш. uj и. 

This assumption allows us to describe the 33 population means as 
in Table 9.10, which presents the statistical model that will be 
used in the analysis of the archery data in Table 9.8. 

The reader may verify that the entries in the right-hand col- 
umn are means of the entries in the corresponding rows and that 
the entries in the lowest row are means of the column entries. 
Another point to note in the table is that although 15 parameters 
appear in Table 9.10 only 13 of these are independent. One re- 
striction is introduced by the fact that и is the mean of the 11 row 
means or the mean of the 3-column means. Another restriction 
is introduced by the fact that the mean of the row means equals 
the mean of the column means. Since we have now only 13 param- 
eters instead of 33 the problem is manageable. 

To complete the statistical model we shall assume that each 
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TABLE 9.10 Revised Population Means of Scores in Table 9.8 


Dk Order Mean of 
Subject First Second Third row means 
1 m. + 1 — И м. + ps — ш. + ps И pu. 

2 p. Fa = m. F a — uc a — и ш. 

3 Ma. + 1 — Ms, + p.2 — Н Ms, Биз — и Hs, 

4 ш. c uaa ba, + paa — м. + pa — и Mu. 

5 Hs. + Ha — ш us, + he — ш Ms. + a — ш Ms. 

6 Hs. лт И Hs. + pa — Нн Me, + pa — ps. 

7 ш. + Ba — ш. tas- р т. + из — Ш ш. 

8 bs. + 1 — wtu- ш. Биш ps. 

9 ps. + a и Me. + pa — Bw р. ua— н po. 
10 шо. + ua — Ш шо. + he — Ш p. + Hs — Шо. 

11 ш. + uaa Bu. Биз и Mu. Биз А Mu. 

Mean of 
Ha m Ha и 


column means 


of the 33 populations is normal, with mean as indicated by its 
position in Table 9.10, and that all of these populations have 
the same variance c?. 

The observations and their row and column means may be 
designated by symbols as in Table 9.11. Thus each cell of the 
table has an observed value X;; and an unknown population mean 


TABLE 9.11 Symbolism for Observations and their Means for 3 Scores on 
Each of 11 Subjects 


Е Order Mean Sum for 
Ве First Second Third for row row 
1 Xu Xi Xn X. T. 
2 Xn Xn Xn X: Ts. 
3 Xn Xx Хз» Xi Ta. 
4 Xa Xa Xa X. Т. 
5 Хи Xs Xs X. Ts. 
6 Ха Xn Xn X. Ts. 
7 Xn Xn Xn X. Th. 
8 Xn Xn Xu Xs. Т». 
9 Xn Xo Xn X. Ts. 
10 Xo, 1 Xo, 2 Ху, з Xv. Tio. 
11 Xu, 1 Xu, 2 Xu, 3 Xn. Tu. 
Mw X. Y. I 
column 
Sum of Ta Ta Ta T 
column 


ошл __ __ —————————— 
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шу = Mi. + uw; — ш. Each row has an observed mean Х; and an un- 
known population mean u;. Each column has an observed mean 
Х.;апа an unknown population mean иу. The table as a whole 
has an observed mean X and an unknown population mean д. 

The sample estimate for u; is X; + X.;— X. The expression 
Xj — ui; is a random deviate. Therefore 


Xj-(X.-X;-X)-X,4-X,.-X,4X 


is the best sample estimate of the error in X; Expressions for 
sample variances which will be needed in testing hypotheses will 
now be given. 

Estimate of c? Based on Error. This estimate of о? is obtained 
from the sample estimate of the error in X,;. It measures the 
effect of random deviation of observed scores Жуу from values ex- 
pected on the basis of row and column means. If each Xj; was 
exactly equal to џ;; this error variance would be zero. We shall 
denote this estimate by the letter E. If wij = pi. и; — u, E is 
an unbiased estimate of о? regardless of the values of the row and 
column means in the population. 

In general, if there are r rows and c columns, 


$XG4-X.o-X;-X) 
- б=1ј=1 
(9.18) Е 102—1) 
It is more easily computed by equivalent formula 
(9.19) 


y Yxe ly (y _1ў ууун ily уу у 
= P Дх T 2 (2a) c p (ха) i re 2 ر‎ 
(r -1))c - 1) 
or 
ZIX’; - 1 Tp 1 YT l T? 
(r - 1)(c - 1) 


Estimate of c? Obtained from Variation between Row Means. 

In this problem, each row mean is the average of 3 scores made 

by one.of the 11 persons shooting at the target. The variance 
3 11 

Y (X + — Xy 


among these 11 means is given by the expression ‘=! 10 › 


(9.20) Е = 
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11 
Жуп 
which is an estimate of a + Ae) H 0 th i 

3 10 ere y is the variance 


of means of 3 scores if these scores are chosen at random from the 
11 


Y (us. — и)" 


same population, with mean и and variance c?, while ‘= 0 


is a component due to variation among population means of rows. 
The hypothesis ш. = ua, = * ** = ш. = и would in this case be the 
hypothesis that the 11 archers have equal skill. Under that hy- 


11 
pothesis this second component is zero and so т У (Х;. — X)? is 
fel 


2 
an unbiased estimate of оу? = T "Therefore under this hypothesis 


an unbiased estimate of c? is provided by 
п 
To iG 9:9} 


This is the mean square among row means. For convenience we 

shall denote it R. In general, if there are r rows and c columns 

and one score in each cell, an unbiased estimate of е? is provided by 
т 

с Y (X ا‎ Xy 

(9.21) тета стти 


if the hypothesis ш = Ш =- -+ = u, = p is true. 
A more convenient routine for computing Ё is 


D qu T 
(9.22) Ri. Sta 


r-1 


The numerator of (9.21) and of (9.22) is the sum of squares for 
rows and the denominator is the corresponding degrees of freedom, 
Estimate of c? Obtained from Variation between Column 

Means. In this problem a column mean is the average score 
made by all 11 archers on one particular order of shooting. The 
variance among the three-column means is 

3 

ўч - п)? 


3 EA i o? 
Pos з = X)?, which is an estimate of тү + > 


1 
2 
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2 
Here ii is the variance of means of 11 scores if these are taken at 


3 
random, while $ Y (u.; — и)? is a component due to variation among 
j=1 


population means of columns. Under the hypothesis that the 
three column means are equal (и: = и. = из = и) this component 


3 
is zero and so У (Х.; – X) is an unbiased estimate of oy? = o?/11, 
j=l 


4 
Therefore under this hypothesis an unbiased estimate of ø? is 
provided by 


3 
E (X,- Xy 


This is the mean square among column means which will be de- 
noted C. In general, for r rows and c columns, 


>) (X;- Xy 


= і 
(9.23) C= PEE 
A more convenient routine for computing C is 
1 1З 
= 2726 
2 ® N 


(9.24) C= 2l 
The numerator of (9.23) and of (9.24) is the sum of squares for 
columns and the denominator is the corresponding degrees of 
freedom. 

Hypotheses and Procedure for Testing Them. An important 
characteristic of the sample mean squares E, R, and C is that 
under the assumptions of normality and equality of variance in 
the population, these three statistics are independently distributed. 
Hence the F test may be used to test hypotheses. 

Usually two separate hypotheses are formulated, the hypo- 
thesis concerning row means: 


H,: u1 = ш. = ++ = = р 
and the hypothesis concerning column means: 
He:iu1 = pa = °°° = pe = 


These hypotheses may be tested separately and one may be re-- 
jected without the other. 
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To test hypothesis H, 
that row means are equal, the ratio F, = R/E is calculated. 
To test hypothesis He 
that column means are equal, the ratio F. = C/E is calculated. 


It is instructive to carry out the calculations indicated by 
Formulas (9.18), (9.21), and (9.23) directly, although far simpler 
procedures are available and will be described below. Students 
do not always correctly understand the meaning of the error.term 
E so Table 9.12 has been set up to list-in detail the numbers (for 
the data of Table 9.8) whose squares are added to obtain the error 
sum of squares which is the numerator of E. Each entry in Table 
9.12 is a residual, the amount by which the observed value Xj; 
differs from the estimate of that value based on row mean, column 
mean and total mean. 

(= 13.70)? + (30.48)? + + + + + (2.55)! _ 12530.909 _ 
Е = О (2) ED 626.55. 
R = 45((139.67 — 100.79)? + (113 — 100.79)? 
+... + (110.33 — 100.79)*} = 4087.4 


C = 42{(88.82 — 100.79)? + (112.64 — 100.79)? 
+ (100.91 — 100.79)?} = 1560.5 


TABLE 912 Value of Xy- (X; + X; — X) for each Entry in Table 9.8 
First Second Third 


Rabies order order order Pen 

1 — 13.70 30.48 — 16.79 — .01 

2 — 2.03 — 3.85 5.88 00) 

3 97 26.15 — 27.12 .00 

4 — 8.03 34.15 — 26.12 .00 

5 — 1.36 — 40.18 41.55 01 

6 30.30 — 26.52 — 3.79 —.01 

1 — 2.03 — 24.85 26.88 00 

8 — 12.70 19.48 — 6.79 —.01 

9 3.97 — 6.85 2.88 .00 

10 3.97 145 — 512 .00 

11 6.64 — 9.18 2.55 91 
TME ee 

Sum 00 — 02 01 — 01 


R 40874 _ h r mean square is always 
Fea = бобр T emo 3 


placed in the denominator and the critical region is in the right 
tail of the F curve. For m = 10 and л = 20, F.9 = 3.37. The 
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observed value 6.52 clearly falls in the region of rejection for 
о = .01, and consequently the hypothesis that individuals do not 
differ in skill must be rejected. = 


C 1560.5 
у F,= E^ 62655^ 2.49. For nı = 2 and m = 20, Е. = 3.49. 


The observed value 2.49 falls in the region of acceptance with 
а = .05, hence there is no reason to reject the hypothesis that per- 
formance is independent of order of shooting, no reason to suppose 
that higher scores are to be expected when the 50-yard range is 
shot in one order than another. 

Additive Nature of Component Sums of Squares. For each 
X; the following identity holds 


Xj-X-(X.-X)-(X,-X)r(X;-X,-X;«X) 
When this identity is squared we get 
(X4- Xy = (X. - Xy + (Ky Xy (X; -X,- X, Xy 
*XX.-X)(X,;-X)-(X.-X)(X,;-X.-X,*X) 
+(X; -X)(X;- X: - ¥, + 3)] 
When the sum is taken over all rows and all columns, the sum of 


each expression inside the square brackets is found to be identi- 
cally zero, so the cross product terms vanish, leaving 


(9.25) ЎЎ(Х„-Ху 
j=li=1 
= cS (X-X) +r X-X) ў o, Fi K+ Xy 
i=l j=l " 


j=liel 


Total sum of sum of sum of 
or | sum of + = { Squares squares squares 
Eu for for for 

TOWS columns error 


The three additive component sums of squares are independently 
distributed with r — 1 degrees of freedom among row means, c – 1 
degrees of freedom among column means, and (r —1)(c - 1) 
degrees of freedom for error. These degrees of freedom together 
make up the rc — 1 degrees of freedom for the total sum of squares. 
When each of the component sums is divided by its degrees of 
freedom, the results are the mean squares R, C, and Е. 
Formula (9.25) may be rewritten in equivalent form as 
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(9.26) zx-5.0 Утг-1)+ +3 yok 
с ig 


where the quantities in parentheses are the numerators of For- 
mulas (9.22), (9.24), and (9.20) respectively. An easy way to 
obtain the sum of squares due to error is expressed schematieally 
as follows: 


Error = Total - Rows — Columns 


Table 9.13 shows computations for obtaining the three com- 
ponent sums of squares by the routine of Formula (9.26) for the 


TABLE 9.13 Computation of Sums of Squares for Data of Table 9.8 


Subject Xa Xa Xa Ti. T? Уха 
i 
1 114 182 123 419 175561 61249 
2 99 121 119 339 114921 38603 
3 51 100 35 186 34596 13826 
4 78 144 72 294 86436 32004 
5 134 125 201 460 211600 73982 
6 71 38 49 158 24964 8886 
7 66 67 107 240 57600 20294 
8 33 89 51 173 29929 11611 
9 146 159 157 462 213444 71246 
10 80 101 83 264 69696 23490 
11 105 113 113 331 109561 36563 
T; 977 1239 1110 3326 
T2 954529 1535121 1232100 
11 
Ў Xia 98665 156231 136858 391754 
i=l 
T*/N = (3326)?/33 = 335220.5 DIX; = 391754 
ET? _ 1128308 _ 376102.7 ТУМ = 335220.5 
E 0j z Total sum of squares = 56533.56 
T*/N = 335220.5 40882.2 + 3120.4 = 44002.6 


Sum of squares for rows = 40882.2 Sum of squares for error = 12530,9 


zrg 2 mm. 338340.9 
3 ТУМ = 335220.5 
Sum of squares for columns = ~ 31204 
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archery scores of Table 9.8. The results of this computation are 
shown in Table 9.14. 


TABLE 9.14 Analysis of Variance of Data from Table 9.8 


Sum of Degree Mean 
Source of variation wee of ЫЧ EF. Fe 
РЧ freedom 
Total group: 56533.5 32 
Between means of individuals 40882.2 10 4088.22 6.53 3.37 
Between order means 3120.4 2 1560.2 2.49 3.49 
Error 12530.9 20 626.5 


———————————————— 


In general the analysis of variance computations may be sum- 
marized as in Table 9.15. 


TABLE 9.15 General Summary of Analysis of Variance Components for Data 
with r Rows and c Columns and One Score in Each Cell 


Source of 


PER Sum of squares d. f. Expectation 
Total ZZX- Кы ro— 1 
N 

lop: TUE ‚у Cu — 
Between row means » T3 - N к. E EET 
Between column lena Ii 72 (u.j—u)* 

means rA Ti Na Ed e c-1 
H 1 p 

Error DIX; PIT; IT. P+ N (т—1)(с—1) о? 


Computations When Original Scores are not Available. Some- 
times one wishes to apply the test deseribed on pages 216 to 226 
to published data for which the original scores are not available. 
It would then be impossible to use any of the formulas which 
have been presented in this chapter, but if for each subgroup the 
computed mean and variance or standard deviation are given and 
the number of cases, the formulas can be adapted as follows: 


k 

УМХ, 

ј=1 га, =N;X; 
(9.27) Mean of total group, X = VUES m 


k 
(9.24) Sum of squares within groups, Y (N; – 1)s? 
je 


3 
(9.29) Sum of squares among means, 
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k 
ENG, -Xy-ZNX;-NX: 


A difficulty often encountered in trying to use published data in 
this way is that the original computer may not have retained 
enough digits to have any significant digits in the sum of squares 
among means. Another difficulty is that one cannot always be 
sure whether N or N — 1 was used as denominator in the original 
computation of the variance. 

Factors Influencing F. In order to assist the student in rec- 
ognizing the general kinds of situations in which the mean square 
for means is typically less than, equal to, or greater than the mean 
square for individuals within groups let us consider a hypothetical 
situation. Imagine that 500 names have been taken at random 
from the list of registered voters in New York City, and that a 
card has been made out for each of the 500 persons, containing 
such information as age, sex, marital status, number of children, 
occupation, number of years of schooling completed, height, 
weight, index of physical strength, party ticket voted in preceding 
presidential election, score on a mental test, score on a test of 
public affairs, voting precinct. Obviously such data would be 
difficult to obtain, but they are real enough to the imagination to 
serve as a good illustration. 

A. Now let the 500 cards be thoroughly shuffled and dealt into 
20 random piles of 25 each. For the mental test scores, let the 
mean square among groups and the mean square within groups 
be computed and the F ratio obtained and recorded. Then let 
the cards be shuffled again and again dealt into 20 random piles 
of 25 cards each and F ratio computed. Let this be done many 
times. Sometimes the mean square between groups will be larger 
and sometimes smaller than the mean square within groups. The 
distribution of the obtained values of F will approximate a smooth 
F curve with т = 19 and m = 480. 

Tf instead of score on mental test, score on test of public affairs 
had been used, both mean squares might have been very different 
from those obtained in A but the theoretical F distribution would 
be the same because its form depends upon the number of de- 
grees of freedom and not upon the nature of the physical variate 
studied. 

Suppose the same procedure were carried through with the 
data on income, or on number of years of schooling completed. 
These variates probably have a very skewed distribution and the 
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assumption of normality in the trait measured, which is basic 
to the F test, would not be met. Certain other traits named such 
as index of physical strength, height, or weight, might have a 
bimodal distribution if both men and women are in the group 
studied. However, if the distribution of the trait does not depart 
ioo radically from the normal, the distribution of the variance 
ratio will resemble the F distribution under the experimental con- 
ditions described above. 

B. Now let the same 500 cards be sorted on the basis of voting 
precinct, and let us suppose 20 precincts are represented. For. 
several of the variates we have mentioned, a group of persons. 
living in the same neighborhood and consequently registered in | 
the same precinct would be more homogeneous than a group ob- _ 
tained by random selection from the entire city. Hence in this - 
situation the mean square within groups is almost certain to be 
reduced and that between groups increased in comparison with. 
situation A. Consider income or years of schooling completed. | 
One feels a rather strong conviction that group means will vary 
more from precinct to precinct than in A and that individual ' 
scores will vary less from person to person in the same precinct ' 
than in A. Consequently, the numerator of the variance ratio ' 
would tend to be larger than the denominator, though not neces- 
sarily larger in every sample. Also, if the cards were sorted on 
the basis of occupation or sex, the income data might be expected _ 
to show a mean square between groups in general larger than that 
within groups, and therefore the variance ratio would not conform 
to the F distribution but would have fewer small values of F and 
more large values than would be expected under situation A, — 
Here the various groups are in reality samples of different popula- | 
tions. 

C. Now let the same 500 cards be sorted as before by pre- 
cincts, but instead of keeping the cards from a given precinct | 
together we shall distribute them so that each group receives 
approximately 1/20 of the cards from each precinct. Now if the 
mean precinct score on any given trait such as income varies 
greatly from precinct to precinct, this new sorting will have the 
effect of making the mean square between means in general less - 
than in either A or B. If the mean square between means is re- 
duced that within groups must be increased for the total mean 
square has not been affected by the manner of sorting the cards. - 
Therefore in such a case F would tend to be smaller than 1. 
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EXERCISE 9.2 

1. In each of the following situations state the number of degrees of 
freedom with which the F table should be entered and the corresponding 
value of Ё 95. 

a. A comparison of the means of 5 groups in which N, = 12, №, = 30, 
М» = 6, N; = 13, N; = 8. 

b. A comparison of the means of two groups к having 25 cases. 

c. There are 5 test scores for each of 20 subjects. We want to test the 
hypothesis that the scores do not distinguish the subjects, that all subjects 
have the same mean. 

d. There are 5 test scores on each of 20 subjects. We want to test 
the hypothesis that the tests are equally difficult, that they all have the 
same mean. 

2. Complete the analysis of the data in Table 9.6. 

3. In a study? of the amount of community information possessed 
by 79 elementary school teachers in St. Louis, Harris constructed a social 
information test and administered it to teachers in four schools, These 
schools were located in districts that differed markedly as to socio-economic 
status, Jefferson being at one extreme and Banneker at the other. How- 
ever, if the differences among the means are no greater than might readily 
oecur by chance, it would be inappropriate to seek for any other explana- 
tion of their origin. From the data reported on page 192, test the hypothesis 
that ш = pa = ps = pa. 
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In this chapter, methods of statistical inference will be 
applied to bivariate data, that is to data in which each individual 
receives a score on two variables, two scaled traits. In Chapter 13 
the work of the present chapter will be extended to situations in 
which each individual receives a score on more than two variables 
(multivariate analysis). In Chapter 11 there will be a discussion 
of methods of measuring relationship between two traits when 
for some reason the methods of the current chapter are not 
applicable. 

Although many readers of this chapter will have studied re- 
gression and correlation in an earlier course, a brief review of the 
principles, formulas and computing procedures may be helpful to 
some. Such a review will be presented in the first part of this 
chapter. 

Linear Regression. Regression is the estimation, or predic- 
tion, of unknown values of one variable from known values of 
another variable. An example of such use of regression is the 
prediction of grades at the end of a course from grades in a prog- 
nostic test given prior to the beginning of the course. 

Let X be a variable for which the value is known for each 
individual in a study. This is commonly called the independent 
variable. Let Y be a variable for which the value is to be es 
timated for some individuals. This is commonly called the 
dependent variable. Let X, and Y, be scores for the ath in- 
dividual. The reader should note that this is a new use of the 
letter а, differing from its use to designate the size of a region of 
rejection. The letter o will be used in both senses from this point 
on, but its meaning will be readily apparent from context. From 
this point on, many problems will involve summing over rows, over 
columns, and also over individuals within cells, and formulas may 

. require three, or sometimes more, subscripts. The use of English 
letters to denote the categories and the Greek letter a to denote 
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the individual within the category makes the meaning of a formula 
more obvious, "Thus Y X, ean be understood at once ая the sum 


for all individuals la. Me fü clan, while YXo might mean 


different. things in different problems and would require a specific 
definition to state whether all three letters indicated categories or 
one of them referred to individuals. Let f' be the estimate of an 
individual's unknown Y from his known X. It is called “ regres- 
sion Y," “estimated Y," “predicted Y," “tilde P," and some- 
timed “curlicue Y." The meaning of Y, as an observed value 
and f, as the best linear estimate based on knowledge of X, 
must never be confused. (For an account of the origin of the term 
" regression" see reference 11.) 

The relation between P and X is called the regression equation 
and, for linear regression, is given by the formula 
(10.1) fam Y X. - X) 
In this formula P, and X, vary from individual to individual, 
while Y, b, and Y are constant for all individuals to whom the 
formula is applied, ; 

The statistic b, may be expressed in terms of deviations from 
the means as 


Ў a-no- 


(10.2) be= 
Yam 

The computing routine suggested by this and Formulas (10.3) 
to (10.5) would be unnecessarily laborious and would incur large 
rounding errors, Equivalent formulas that are more convenient 
to use will be found in later sections where machine computation 
and computation from a scatter diagram are treated, There ія а 
very large number of equivalent formulas which are useful to know. 
The efficient computer understands the algebraic relations among 
them and makes use of whichever one appears to offer the simplest 
routine for the data in hand, In order to keep the development 
simple, only the most necessary formulas are presented in the 
chapter but а longer list i» given in an Appendix to the chapter. 

"The reason for heading this section ‘Linear regression” is 
that the graph of Formula (10.1) is a straight line, This formula 
is appropriate to use when the mean value of Ffor each given value 
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of X lies near a straight line. If these means depart considerably 
from the straight regression line, a curvilinear regression line is 
more appropriate. Curvilinear regression will not be taken up 
here. It is treated in Chapter 23 of reference 1, in Chapter 6 of 
reference 5, and in Chapter 12 of reference 8. 

If a graph of Formula (10.1) were plotted on a chart of the XY 
distribution the constant b. would represent the slope of the line, 
and the line would pass through the point (X, Y). The reader 
who does not understand the relation between an equation and its 
graph will find Chapter 10 of reference 10 helpful. 

The difference Y, — Ў, between an individual's observed Y 
score and the regression estimate for that score is called a residual 
or residual error. The sum of the squares of the residuals for all 
N individuals is 

- $= _ yy: BA-Y - PF 
(10.3) Z(Y - PF) = х(Ү- Y) zX-Xy 

This sum has N — 2 degrees of freedom because there are N 
observations from which two constants, the mean of Y and the 
slope of the regression line, have been obtained. 

The equation in Formula (10.3) may be read as follows: The 
sum of squares of deviations of individual Y scores from the re- 
gression estimates corresponding to these scores is less than the 
sum of squares of the deviations from the mean, Y, for all these 
scores by the quantity 

[2(X - FY - Y)} 
У(Х - X) 

If only Y scores are available, Y is the best estimate of an un- 
known Y. For a given sample size the quantity Z(Y — Y) is a 
measure of the precision of estimate obtained from Y. If an X 
score is known for an individual, but the Y score is not, then Ў, 
calculated from the known X, is the best estimate of Y. Fora 
given sample size the precision of an estimate of Y obtained from 
a known X is given by Х(У — Y)? The quantity 

[Z(X - XY - Ур 
2(X — X; 
measures the improvement in precision of estimating Y obtained 
by using X over the precision in not using X. Clearly, the larger 
this quantity is, the greater is the improvement in estimating Y by 
using Х. . 
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The Correlation Coefficient. We may write Formula (10.3) as 
(10.4) Z(Y-Ty-(-r)z(Y-Yy 
if we define r by the formula 
pa 2K XQY-Y) € 
VZ(X — Xyz(Y - Y) 


It becomes obvious that the value of r measures the gain in pre- 
cision of estimate of an unknown Y from a known X. If ris zero, 
Z(Y — Ў)? is the same as Z(Y — Y)? and there is no gain in pre- 
cision. If r is near + 1 or near — 1 the gain in precision is great, 
for then Z(Y — P)? is much less than Z(Y — У). 

The quantity r defined in Formula (10.5) is called the coefficient 
of linear correlation. The word “linear” is included because of its 
relationship to straight line regression. This coefficient is often 
called the product moment coefficient because the quantity 
У(Х — Х)(У – Y) is а product moment. It is also called Pearson 
r because of the important contributions of Karl Pearson." 

In some problems interest centers on the estimation of scores 
by means of a regression equation; other problems are concerned 
chiefly with the mutual interrelationship between two variables 
as measured by the coefficient of correlation. In some problems 
both regression equations 


Ley poe xX) 
and A e Y be (Yi) 


are of interest, in others only one. Thus there would be practical 
value in predicting college achievement from a prognostic test 
but little value in estimating prognostic test scores from measures 
of college achievement. In a regression equation the relationship 
is from known scores to estimated scores and is not reversible. In 
correlation the relationship is mutual, and 


(10.6) Tay = V bubus 


The order of letters in the subscript for r is immaterial, rz, = fyz. 
For b the order is important and in general bzy # byz. 

Machine Computation of b,x and r,, Without the Use of a 
Scatter Diagram. To illustrate the way in which data may be 
organized so that adequate checks are obtained when a computing 
machine is used, a problem will be set up using a relatively small 


(10.5) 
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number of cases. The members of a class in first term statistics 
were given several prognostic tests at the first meeting of the 
course. One of these, which is here called X, was a specially con- 
structed test in artificial language. The criterion variable Y, is 
the semester grade in the course. The X and Ү scores for a part 
of the class are listed in Table 10.1, the number of cases having 
been reduced because it does not seem likely that readers of this 
text will learn any more from working with a long series of data 
than with a short one. The X — Y and Х + У columns аге 
utilized for checks. 

In machine work there is no need to keep numbers small and 
therefore gross scores (X and Y) are used instead of deviations 
(Х-Х and Y-Y or X - A, and Y — Ау). Less work is in- 
volved and greater accuracy secured if division and square root, 
which usually involve rounding error, are performed as late in 
the process as possible. The following formulas for b,. and гш 
meet these criteria and are equivalent to Formulas (10.2) and 
(10.5): 

NZXY - (ZX)(ZY) 

NZX? – (ZX)? 


(10.8) and ~r V[NZX: - (ZX)*)[NZY? - (ZY):] 


On certain machines it is possible to enter X, Y, X*, Y? and XY 
in a single setting of the machine, and thus to obtain EX, ХУ, 
ZX*, DY? and ZXY in one process (see K. Pease’). Even with 
a machine, some check on the correctness of the results is essential. 
One possible check is to cumulate the XY products as a separate 
operation. If the result agrees with EXY previously obtained it 
is unlikely that there is an error in the sums found, but this check 
does not guarantee correctness. A foolproof check can be ob- 
tained from the sums and sums of squared entries in X — Y and 
X + У columns. If the five sums named above have already been 
found by the method described by Pease, use of X — Y column 
alone would provide a complete check. Any of these formulas may 
be used to check the sums: 


(10.7) bys race 


(10.9) УХ +ХЎҮ -Z(X +Y) 
(10.10) ZX - LY =2(X - Y) 
(10.11) Z(X + Y) - Z(X - Y) = 22X 


(10.12) У(Х + Y) - 2(X - Ү) = 25Ү 
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TABLE 101 Marks of Students in First Term Statistics Course (Y) and 
Scores on an Artificial Language Test (X) used To Predict Term Marks 


Student x: Y Any: X+Y 
1 52 62 — 10 114 
3 57 50 7 107 
3 56 57 LA 113 
4 30 32 a2 62 
5 47 55 op 102 
6 49 43 6 92 
7 60 59 1 119 
8 56 57 i 113 
9 53 55 2202 108 
10 55 56 zw 111 
11 54 55 ad 109 
12 53 56 — 109 
13 58 57 1 115 
14 57 48 9 105 
15 45 36 9 81 
16 46 57 Ly 103 
17 60 63 ES 123 
18 37 47 = 10 84 
19 24 43 = 19 67 
20 43 57 - и 100 
21 47 57 = 10 104 
22 50 47 3 97 
23 48 47 1 95 
24 42 49 zen 91 
25 59 51 8 110 
26 40 48 = ws 88 
27 41 58 Zo 99 
28 47 63 E 110 
29 43 56 zig 99 
30 57 50 7 107 
31 32 46 =a 78 
32 53 43 10 96 
33 46 39 7 85 
34 54 47 7 101 
35 49 48 1 97 
36 58 55 3 113 
37 55 47 8 102 
38 14 35 01 49 
39 50 62 — 12 112 
40 41 59 Eig 100 
41 14 29 = ДВ 43 
42 46 54 018 100 

Sum of scores 1978 2135 157 4113 

| Sum of squares 9232 111399 3937 415325 


of scores 
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These formulas may be used to check sums of squares and sums of 
products: 

(10.13) У(Х + Y)?+ 2(X – У)? = 25Х° + 25ү 

(10.14) ZXY = }[Z(X + Y) - Z(X ~ Y)?] 


E aie checks of Formulas (10.9) to (10.13) all hold for the data 
of Table 10.1. Then we may compute ZXY by (10.14) as 


3[415325 — 3937] = 102847 
The following values have now been obtained: 


М = 42 ZXY = 102847 ZX? = 98232 
=X = 1978 ZY = 2135 ZY? = 111399 
X = 47.10 Y = 50.83 


From these the following computations are made: 

NZzy = NZXY — (ZX)(ZY) = 42(102847) — (1978) (2135) = 96544 
Nia? = МХХ? — (ZX)? = 42(98232) — (1978)? = 213260 

N3y? = NZY? — (ZY)? = 42(111399) — (2135)? = 120533 

96544 


buz = 213260 


= 453 


HR 96544 A 
7" 0213260) (120533) 


Ӯ = 50.83 + .453 (X — 47.10) 
Y = 453X + 29.5 


To see the effect of the regression equation we may apply it 
to the X scores of the first 5 cases in Table 10.1 to see what У 
score would have been estimated for them if that Y score had 
happened to be unknown. 


.602 


Student X f Y Y-f y-Y 
1 52 53.1 62 8.9 ч 11.2 
2 57 55.3 50 —53 ES 
3 56 54.9 57 21 6.2 
4 30 43.1 32 — 11.1 — 18.8 
5 47 50.8 55 42 42 


In no case did the estimate of Y based on X agree precisely with 
the obtained Y; the residual error Y — Ў was sometimes positive 
and sometimes negative. On the whole the deviation of Y from 
the regression value (Y — Y) seems to be smaller than its devia- 
tion from the mean (Y — Y). A method of measuring the efficiency 
of prediction will be discussed on page 244. 
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Computation of b,, and r,, from a Scatter Diagram. Even 
when a machine is available many statisticians prefer to work 
from a scatter diagram in order that they may have a general 
impression of the linearity of the relationship. Those who prefer 
to work directly from the list of raw scores sometimes need to 
compute when no machine is available, and then they usually 
make use of a scatter diagram. Coding the scores reduces the 
size of the numbers involved and thus cuts down on labor. , 

To code scores, an arbitrary origin is taken at the midpoint 
of some interval. This interval is usually either the lowest interval 
in the distribution so that all deviations will be positive or it is 
near the middle of the distribution so that deviations will be 
numerically small. This arbitrary origin will be called A, and 
when more than one variable is being studied it will have a sub- 
script as As, Ay, or A.. Then if X; and Ý; are the midpoints of 
the class intervals, we may define the coded scores as 


Yj- A, 


(10.15) af - ® 4+ and yj; = 


The actual process of coding is merely to label one interval 
as 0, successive intervals above it as 1, 2, 3 . . . and successive 
intervals below it, if there are such, as -1,-2,-3.... 

The symbol fz, will denote the frequency in a cell in the body 
of the scatter diagram, /„ a marginal frequency in an interval of 
the horizontal variable, f, a marginal frequency in an interval of 
the vertical variable. 

The formulas for r and by: based upon coded scores are 


i, N3fae'y’ — (Xf) fu) 
(10.16) b, 3 HE x NZf.(z')* = (Zf.z' 2 
NZfz'y — (Zfa")(Zfw) 
OL a ДИТ = Sf Y ENZA Y — Gf 


There are many different routines by which to compute the 
sums of scores, sums of squared scores, and sums of products 
which enter into Formulas (10.16) and (10.17). The method 
shown in Table 10.2 has the advantages of celerity, directness, 
and partial checks. The values 

М = Zf, = If, C = Zfal 

A = Zfu' D = Худу 
are computed twice and so their accuracy is beyond doubt. The 
values of Zf,(y/)? and Zf.(z')* are not checked by this method. 


236 - Linear Regression and Correlation 

These formulas may be used to check sums of squares and sums of 
products: 

(10.18) D(X +Y) +(X- У)? = 25Х° + 25ү? 

(10.14) IXY =  2(Х + Y) - Z(X ~ Y)*] 


The checks of Formulas (10.9) to (10.13) all hold for the data 
“of Table 10.1. Then we may compute ZXY by (10.14) as 


1[415325 — 3937] = 102847 
The following values have now been obtained: 


М = 42 ZXY = 102847 DX? = 98232 
ZX = 1978 ZY = 2135 DY? = 111399 
X = 47.10 Y = 50.83 


From these the following computations are made: 
NZzy = NZXY - (ZX)(ZY) = 42(102847) — (1978) (2135) = 96544 
Na? = МУХ? — (ZX)? = 42(98232) — (1978)? = 213260 
NZy! = NZY? — (ХҮ)? = 42(111399) — (2135)? = 120533 
_ 96544 _ 
213260 ` 
GS 96544 " 
^  4/(213260)(120533) 
Ӯ = 50.83 + .453 (X — 47.10) 
Y = .453X + 29.5 
To see the effect of the-regression equation we may apply it 
to the X scores of the first 5 cases in Table 10.1 to see what Y 


score would have been estimated for them if that Y score had 
happened to be unknown. 


b, 453 


.602 


Student x Y M Y- Y-Y 
1 52 53.1 62 8.9 11.2 
2 57 55.3 50 —53 | -8 
3 56 54.9 57 2.1 6.2 
4 30 43.1 32 -111 — 18.8 
5 47 50.8 55 4.2 4.2 


In no case did the estimate of Y based on X agree precisely with 
the obtained Y; the residual error Y — Y was sometimes positive 
and sometimes negative. On the whole the deviation of Y from 
the regression value (Y — Y) seems to be smaller than its devia- 
tion from the mean (Y — Y). A method of measuring the efficiency 
of prediction will be discussed on page 244, 
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Computation of b,x and т, from a Scatter Diagram. Even 
when a machine is available many statisticians prefer to work 
from a scatter diagram in order that they may have a general 
impression of the linearity of the relationship. Those who prefer 
to work directly from the list of raw scores sometimes need to 
compute when no machine is available, and then they usually 
make use of a scatter diagram. Coding the scores reduces the 
size of the numbers involved and thus cuts down on labor. . 

To code scores, an arbitrary origin is taken at the midpoint 
of some interval. This interval is usually either the lowest interval 
in the distribution so that all deviations will be positive or it is 
near the middle of the distribution so that deviations will be 
numerically small. This arbitrary origin will be called A, and 
when more than one variable is being studied it will have a sub- 
script as Az, Ay, or A. Then if X; and Y; are the midpoints of 
the class intervals, we may define the coded scores as 
y; Y; Zo Ay 
ر‎ = 


z ty 


(10.15) zj = 


The actual process of coding is merely to label one interval 
as 0, successive intervals above it as 1, 2, 3 . . . and successive 
intervals below it, if there are such, as = 1, – 2, -3 . . .. 

The symbol fz, will denote the frequency in a cell in the body 
of the scatter diagram, fz a marginal frequency in an interval of 
the horizontal variable, f, a marginal frequency in an interval of 
the vertical variable. 

The formulas for т and б based upon coded scores are 
(1046) Dye = È NB = Gf) Gi) 

i "dd NALE ES 

NZf.z'y' — (јх) (ХМ) 
NZf.(a? — fax INZA) — fuy’) 

There are many different routines by which to compute the 
sums of scores, sums of squared scores, and sums of products 
which enter into Formulas (10.16) and (10.17). The method 
shown in Table 10.2 has the advantages of celerity, directness, 
and partial checks. The values 

N -Zf.-Zf, C = Zfa' 

A = Хју D = Хју 
аге computed twice and so their accuracy is beyond doubt. The 
values of Zf,(y’)? and Zf.(z')? are not checked by this method. 


(10.17) and r4, = 
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For a large-scale enterprise, the authors recommend that com- 
putations be made by I B M machines. For a smaller study when 
a ‘scatter diagram is to be used, they recommend using one of 
the various published charts which provides for a complete check 
at every step, such as the Durost-Walker Chart published by the 
World Book Company. 

Grouping scores into class intervals tends to produce a slight 
increase in variance and so a slight decrease in r. The error thus 
created is negligible if there are 12 or more intervals for each 
variable and N is large. The error is likely to be greater when 
N is small and when intervals are broad. In the present problem 
such grouping error accounts for the discrepancy between r = .602 
when computed directly from raw scores and r = .597 when com- 
puted from data grouped into class intervals. 

A Mathematical Model for Linear Regression. The mathe- 
matical model which is commonly used in dealing with regression 
estimates of Y is one in which the values of X are the same for 
all possible samples as they are for the observed sample. There- 
fore the population mean of X is X as this quantity is calculated 
from the sample of observations. The X’s are sometimes called 
known parameters. 

X = u 


The Y’s, however, are random variables. For each observed 
X, the population distribution of Y is normal with a mean which 
depends on X and a variance which is the same for each X. The 
population mean of Y for a given X is written 
(10.18) Шу. = ш Bu (X — X) 

The constants и, B and X are to be estimated from the sample. 
Of these X is fully determined from the sample as the mean of 
X's, and this mean is not an estimate of a parameter but is in 
fact that parameter. The remaining constants are unknown 
parameters which are merely estimated from the sample. The 


estimate of u, is Y = I The estimate of Byz is by, as defined 
in Formula (10.2). 

The variance of Y for a fixed X is 
(10.19) а?у: = oy (1 — p?) gene] 
where g,?_and p are the variance and the correlation coefficient in 
the population. The standard deviation of Y for a fixed X, 
(10.20) Фу» + 0 Vl — pi, 
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is commonly called the standard error of estimate of Y on X. It 
is the standard deviation in the population of the residual errors 


Ya — Шул. 4 
The best sample estimate of o’,.. is 


(10.21) fis xL Ў. -E£ 
(10.22) ug M z 1 s - n) 


The symbol зу. had been used by statisticians a long time before 

they began to take the idea of degrees of freedom seriously, and 

had been defined as s, = s,V1 — т. However s,(1— 7?) is a 

biased estimate of c,'(1— p?) and multiplication by the fraction 

M = 3 corrects the bias. Unless N is small, is so near unity 

that the traditional definition is nearly as good as the correct one. 
A convenient formula for computing Sy.z is 


Хау? — (Lay)? 
(10.23) aya as ea Ede 


Sampling Distributions of Statistics in Linear Regression. 
The distributions of the statistics Y, by, and sy. must be known 
in order to test hypotheses or make interval estimates for the 
parameters uy, Byz, and 0*y... These distributions can be derived 
mathematically from the mathematical model for regression. The 
constant X being by assumption the same for all samples is a 
parameter which has no variability and no sampling distribution, 

1, The mean Y. Imagine a total population from which re- 
peated samples of N cases are drawn by unrestricted random 
sampling. As was stated in Chapter 7, the distribution of Y 


would then be normal with mean џи and standard error Л? 
and the statistic 
= (Y = by) VN 


Sy 
would have “Student’s” distribution. This is a familiar relation. 
However, the value of Y which occurs in the regression equation 
is the mean of Y values in a sample which has a particular fixed 
set of X values, and u; is the value of the mean of a subpopulation 
which includes individuals with those X values and no others. There- 


t 
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fore it is reasonable that if Y is related to X the Y in the regres- 
sion equation would have a standard error smaller than o,//N. 
The subscript z may be used, writing Y, for the mean of a sample 
with fixed set of X values and py for the corresponding popula- 
tion value. Then Y, is normally distributed around By With 
standard error 


= р? 
(10.24) Sean N Е «М 


(Y. – by) VN 
8y.z 
has ‘‘Student’s’”’ distribution with N — 2 degrees of freedom. 
2. The regression coefficient by. The sample regression coeffi- 
cient 5, is normally distributed around fyz. The sample estimate 
of its standard error is 


(10.25) The statistic t= 


8, 
10. = 2 
(10.26) [ iat 
(10.27) and j =z fee Ore = Rev = 1 
х Soye ER: 


has “Student’s” distribution with N — 2 degrees of freedom. 
To test Н : @ = 0, the ¢ ratio reduces to 


(10.28) t= na which is equal to 
(10.29) t= .OExyvN-2 VN -2 
Viv Dy? – (ху)? 


8. The standard error of estimate 8y... The statistic 
(NA Lu y. _ (М - 2) 5225 — (Zoy)*] 


giis 


(10.30) 
05s 


has the chi-square distribution with N — 2 degrees of freedom. 
Confidence Interval for uy, б, ох and рух. Interval esti- 

mates can readily be developed from Formulas (10. 25), (10.27), and 

(10.30). If 1 — о is the confidence coefficient, these estimates are 


1031 Ү. + the € ш < Yet E tte 


8y.2 Svs i 
(10.32) Bye + ENTE ly. < В. < bye + GON = i-ja 
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=D o a te (№ — 2) 
(10.33) ae «one < 9: Xa 


P 2 

(10.34) Ў. + iss. pE 
` I Pg) (X — X)* 
< ш: < Ё. + Йй-е8 N ^ (N - Ds 


For each value of X the confidence interval for 
Hys = By + Byz(X 2 X) 
is a pair of points given by formula (10.34). These points lie on 
two curved lines, one of which is above and one below the sample 
line of regression. These curved lines are closest to the sample 


line when X = X and depart further from it as X departs from X. 
The t-table is entered with N — 2 degrees of freedom. 


EXERCISE 10.1 
Use the data from Table 10.1 (p. 235) to answer questions 1 and 2. - 
1. Test these hypotheses: (а) Byz = 0; (b) Bye = .6; (c) ш = 52. 
2. With a = .05 find an interval estimate (a) for Bys, (b) for uy, 
(c) for c*, ., (d) for ду. for an individual whose X score is 39. 
3. The following data are made artificially simple in order to enable 
you to make the necessary computations very easily and concen- 
trate your attention on the method. 


№ = 27 Ir'y' = — 83 ї. = 5 А. = 32.5 
Хх’ = — 45 Iy = 81 ъ= 3 А, = 60 
I(x’)? = 219 L(y’)? = 412 


(a) Find X and Y 

(b) Find bys 

(c) Find the regression equation to predict Y from X 

(d) Find зї, 

\®\ With confidence coefficient .99, what is the interval estimate for 
Bye, (Qu, (3o. 

(f) Test\the hypothesis (1) 8,.=0, (2) fy. $0, (3) Bye = .65, 

at Significance level .01. 


Component S3ums of Squares in Regression. The Y score of 
any given individual (Y.) is the sum of three components: (1) the 
mean of the entire Y distribution (Y); (2) a regression estimate 
(Ӯ. - Ү) of his deviation from that mean made on the basis of his 
deviation from the \mean of X; and (3) the deviation of his score 


from the regression &stimate, or the residual error (Y.- P). In 
symbols 
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Y, = (Y) + (Y, -Y) + (Y, - Y) 
or Yoo Y= (Y= Vea) 


Therefore 
(Y, - Y) = (Y, - Y)?+ 2(Y, - YY(Y, - Y.) + (Y, - ¥,)? 


When this expression is summed for all individuals in the sample, 
the cross product term is found to be identically equal to zero. 
Then we have the very important relation 


N N N 
(10.85) У (Ya - P)? У (Pa - Р) у (Ya T 


which may Бе put into words as follows: 


The sum of the squares of the deviations of the observations from 
the mean of all observations is equal to the sum of the squares 
of the deviations of the regression estimates from the mean of all 
observations and the sum of the squares of the deviations of the 
observations from their regression estimates. 


The left-hand member of Formula (10.35) is already well known 
as equal to (N — 1)s?. The two expressions on the right can be 
reduced to expressions in s? and т: 


(10.36) : (Ў. – Y)? = (N - Dey? 
а=1 


(10.37) апа y (Ya — Ӯ.) = (N - Dsj(1 - т) 
а=1 


The algebra which leads to these results will not be reproduced 
here but it consists in substituting for Ў its value from Formulas 
(10.1) and (10.2), squaring the binomial, summing and combining 
similar terms. Then 

(10.38) (N -1)s = (N – 1)r'sf + (N — 1)(1 = 7)s 

(10.39) and sy! = rs + (1 — т) 8,2 

This is an important relationship leading to an interpretation of 
r. The total variance among the Y scores s,* has been divided 
into two components. The component rs, comes from variation 
of Y, about Y and so is the portion due to regression. The com- 
ponent (1— 72)s? comes from the variation of Y, about Y. that 
is the variation of scores about the regression line, and is variation 
due to some source other than regression on X. Therefore 
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т? is the proportion of the Y variance attributable to the rela- 
tion of Y to X 
1 — r? is the proportion of the Y variance not attributable to the 
relation of Y to X. 

The relationship in (10.38) may be regarded as a partition of 
the sum of squares of the same general nature as the partition 
described in Chapter 9 as analysis of variance. This partition is 
displayed in Table 10.3. 


TABLE 10.3 Analysis of Variance in Regression 


8 f variati Degrees of Sum of Mean 
ab OF уапачоп freedom squares square 
Regression estimates 
around Y 1 (N — 1)r°s,? (N — 1)ris? 
Observations around N- 


regression estimates №-2 — (N-10-5 А201 mss 
cc ee ыле ы le udi быы NT Sad 
Observations around Y N-1 (N — Da ву? 
Io кк шы Rond ЬЬЬ МВС. 


From the two components of the sum of squares an F ratio 
can be obtained to test the hypothesis that the regression coeffi- 
cient is zero in the population, so X is of no value at all in es- 
timating Y. 

H:8,.=0 


(N - 1)r°s,? /(N - 1)(1 — 7a? 
fs 1 / N-2 


XN — 2 
(10.40) or F- fet 


Since F has 1 degree of freedom for the numerator variance and 
М – 2 for the denominator, 


(10.28) t= VF = BE with N — 2 degrees of freedom. 


Thus:the formula already given as (10.28) on page 241 has now 
been reached again by analysis of variance. 
This partition of the sum of squares shows the effect of the 


size of the correlation coefficient upon the precision of estimate. ` 


If r is large, most of the variance in the y scores can be attributed 
to the information contributed by the regression equation. If r 
is small, little information is contributed by this equation. The 
variation must then be explained by other means if at all. The 
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F test may be used to determine whether the information yielded 
by the regression study makes any contribution at all. 

Sometimes F is to be computed directly from original data 
and the value of r is not needed for any other purpose. In that 
case, ease of computation and desire to minimize the effect of 
rounding errors make it expedient to use Formula (10.41) which 
is equivalent to (10.40) and closely related to (10.28). 
put =2) CN - 2) 

1-7 ZzXy — (Dry)? 

Test for Linearity of Regression. If regression is truly linear 
in the population, the mean и; of all values of Y which are as- 
sociated with a particular X; will lie precisely on the population 
regression line: ру. = My + By:(X — X). In this case, и; — [e 
for every j. However, even when regression is linear in the popu- 
lation, the sample values of Y; will not agree precisely with the 
population values of u; and will not all lie on the sample regression 
line Y = Y + b,.(X - X). It is apparent that a test of the 
hypothesis иу — u,.s = 0 should involve the sum of squares of the 
sample differences Y; — Y;. 

To simplify the formulas needed we shall use these additional 
symbols: 


(10.41) 


N 
T" = sum of all y’ scores in table = Y Ya! 


а=1 


Ni 
T; = Y, ya’ = sum of all y scores associated with. X; 


а=1 
] k Ni 
C = sum of squares of Y scores about regression = Y Y (Y;a — P; 
{ fat ant 
W = sum of squares of Y scores about column means 


(10.42) W= ў Y МКУ. = 7) 


j=l a=1 
М = sum of squares of Y means about regression line 


k 
(10.43) M = NY; - P 
j=l 
As W and M are independently distributed, they can be used 


with their degrees of freedom in an F test, and W + M 2C, 


C has N — 2 degrees of freedom 
W has N — k degrees of freedom 
and M has k — 2 degrees of freedom 
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The statistic 

M/k-2 M N-k 
(10.44) F-WwAN-B OW k-2 
provides & test for the hypothesis of linearity. Formulas to 
facilitate the computation of M and W from an arbitrary origin 
are needed. 

Dry)? 

(10.45) C» zy - 200 
Where x and y are deviations from means, not from arbitrary 
origins. 


= 9 eM LC ERE Ў (T7) 
(10.46) Eon [Ale » 2 N; ] 
(10.47) M-c-W 


These formulas will now be applied to the data in Table 10.2. 
МУ? = 9(13172) 
NZaz* = 9(23292) 
NZzy = 9(10452) 


i T LIS 2 
C=% [902172 “53595 | = 1817.58 
(ЕИ КОН САТЕ А 
ттер тр = 2386.62 


W = 9(2514 – 2386.62) = 1146.42 
М = 1817.53 – 1146.42 = 671.11 
F- 671.11| 1146.42 y 671.11 ‚ 29 (s 
13—2|42 —13 1146.42 11 
Fos = 2.14 


Bivariate Population Model. The regression model described 
on page 239 is appropriate to use when interest centers around 
the problem of estimating one unknown variable from another 
known variable. In problems about the mutual relationship of 
two variables or in problems in which it is necessary to find an 
estimate of either variable from the other, that model will not 
suffice. For example, suppose two tests, X and Y, have been given 
to a group of subjects on different days, a few subjects being 
absent on one day and a few on the other. It is for some reason 
not feasible to obtain the missing scores by administering the test 
again. From the subjects for whom both scores are available, 


1.54 
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two regression equations are computed, one to estimate the missing 
scores in X from observed scores in Y and the other to estimate the 
missing scores in Y from observed scores in X. 

In the mathematical model used for problems of this sort each 
individual is a pair of observations. A sample is then a ‘sample of 
pairs from the population of paired observations. In this model 
both X's and Y's vary randomly from sample to sample, whereas 
in the regression model previously described the X's are fixed for 
all samples and only the Y's vary. A population of this type is 
called bivariate. 

The probability distribution of a population on a single con- 
tinuous variate, that is a univariate population, was discussed in 
Chapter 5. That distribution consists of all the points on a line 
segment (which may or may not extend to infinity) and proba- 
bilities corresponding to specified intervals on that line. Those 
probabilities are represented by the areas under the probability 
curve between two ordinates. 

For a continuous bivariate population, the scales of the two 
traits will be represented by two coordinate axes, and each pair 
of scores for one individual will locate a point in the plane of those 
axes. The bivariate probability distribution then consists of all 
points in a portion of this plane (which portion may or may not 
extend to infinity) and the appropriate probability values. These 
probability values are represented by volumes. The probability: 
that a< X « b and simultaneously с< Y < d is written 


Pla< X«b,c«Y «dj 


This probabihty is a volume within a solid figure whose base and 
4 sides are planes but whose top 
may not be a plane. The X, Y 
scales, the intervals and proba- 
bility are shown in Figure 10-1. 
The probabilities of the type 
shown in the figure may all be 
described as volumes under a sur- 
face which has a mathematical 
des 3 А Fra. 10-1. Segment of volume repre- 

In any bivariate population, senting the probability Pla < X < b, 
each variable has its own mo- c< Y <d} in a bivariate distribu- 
ments and so the probability dis- tion. 
tribution involves the four parameters Hz, Hy, Tz, and oy. In addi- 
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tion the bivariate population has a product moment which is the 
expectation E(X — д„)(Ү — и). This product moment is also 
called the covariance. The correlation coefficient of X and Y in 
the population is defined as 


ра E(X = р.)(ҮУ = ш)] 


0,0, 


(10.48) 


Normal Bivariate Population. In this chapter we shall be 
concerned with only one type of bivariate population, namely 
the normal bivariate, which is an extension of the univariate normal 
population to the bivariate situation. The tests of significance 
described in the remainder of this chapter are based on the assump- 
tion of such a population and do not necessarily apply if the popula- 
tion differs too greatly from that form. 

For the normal univariate distribution of X, every point on & 
line represents a possible value of X. To set up the probability 
curve for X, an axis is drawn perpendicular to the scale of X. On 
this axis the ordinate (which we shall call и) is measured to repre- 
sent the probability density of X. Then as already discussed in 
Chapter 5, 


1 1 (X-u)! 
10. = —— ¢ 2 a 
(10.49) u EI 


This curve has two parameters, и, and ø+. The ordinate u is not 
a probability because probability would be represented by the 
area under the normal curve between two ordinates, but it may 
be thought of as something like the average height of such an area 
when its base is very small so that the two ordinates are nearly 
the same. 

For the normal bivariate distribution of X and Y, every point 
on a plane represents a possible pair of values. To set up the joint 
probability surface for X and Y, an axis is drawn perpendicular 
to that plane. On this axis the ordinate (which we shall call u) 
is measured to represent the probability density of the X, Y dis- 
tribution. For each point this ordinate is 


1 1 [mast ау) у Yu) 
(10.50) u = ary | ی‎ E) y 
FP ape Ше: ml 2: y } 
This probability surface has 5 parameters, uz, Hy, Cz, Oy, and p. 
The ordinate и is not a probability because probability is repre- 
sented by a volume, but it may be thought of as something like 
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the average height of such a volume when the area of its base is 
very small, so that all the ordinates are nearly the same. 

When the parameters are given definite values, и is defined for 
each point (X, Y), and varies from point to point. Populations 
with different parameters will have different probability distri- 
butions. 

The normal bivariate probability distribution is traditionally 
spoken of as having the form of a “cocked hat.” The marginal 
distributions for X and for Y are both normal. Furthermore, if 
the three dimensional solid is cut by any plane perpendicular to 
the (X, Y) plane, the intersection is a normal curve. 

If any bivariate probability distribution, whether normal or 
not, is cut by a plane parallel to the X, Y plane, the intersection 
will be a kind of contour line marking out those points in the plane 
which have equal probability densities, that is points at which the 
ordinates are equal. For the normal bivariate distribution all 
such contour lines are ellipses with their center at the point 


(X = ua, Y = ш) 


All such ellipses for one distribution have the same axes so they do 
not intersect each other. 

Each normal surface has its own set of ellipses. These ellipses 
are of particular interest because when a sample from a normal 
bivariate population is plotted on a scatter diagram the distribu- 
tion of tallies is concentrated about the point representing the 
sample means in accordance with the form of the ellipses obtained 
from the distribution. This phenomenon is familiar to all persons 
who have worked with scatter diagrams and corresponds to the 
concentration of tallies about the mean of a univariate normal 
population. Figure 10-2 shows how these ellipses change as the 
parameters of the distribution are changed. 

Distribution of the Correlation Coefficient. One difficulty in 
problems of statistical inference in relation to the correlation coeffi- 
cient arises because the range of this coefficient is — 1 to + 1 both 
for т and p. Hence т can never have a truly normal distribution 
regardless of the form of the population sampled. When the 
population is bivariate normal and p = 0 the distribution of r is 
symmetrical and when the number of cases is large the distribu- 
tion of r can be reasonably approximated by the normal curve. 

When p differs from zero the sample r’s cluster about the 
population value as may be expected. But now there is less space 
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Fic. 10-2. Ellipses of equal probability density for normal bivariate surfaces | 
having different values of the five parameters, Hz, ду, ©, Oy, psy. 


on one side of p than on the other. Hence the clustering is denser 
on one side than on the other, and the distribution of r is skewed. || 
A mathematical formula which describes the distribution of r | 

for any value of p was derived by В. A. Fisher in a classic paper 

. Which ushered in a new way of thinking about statistical distri- 
butions (see reference 6), but it is very complicated, inyolving p | 

and N as well as r. This exact formula for the sampling distribu- 
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tion of r is seldom used but as substitute there are several approx- 
imations which are valid under certain assumptions and give 
results accurate enough for all practical purposes. The exact 
distribution of r has been published by David with a discussion of 
the accuracy of approximation of the various substitute methods 
for testing the significance of r.* 

When N = 2, т can take only the values + 1 and — 1 and so the 
distribution consists merely of an ordinate at each of these points. 
When N = 3 and p = 0 the distribution of r is U-shaped, that is, 
symmetrical but lower in the center than at the ends. As N in- 
creases the distribution becomes higher in the center and lower 
at the ends until for N = 30 or more, its form is approximately 
normal. 

Tests of the Hypothesis that Correlation is Zero in the Popula- 
tion. When a normal bivariate population has the parameter 
р = 0, the statistic 
.TKN -2) 


F 1-7? 


has the F distribution with 1 degree of freedom for the numerator 
and N — 2 for the denominator. Consequently VF = t, and the 
statistic 
ТУМ =2 
i pipes Sacha ыы 
(10.28) Warn 


has Student's" distribution with N — 2 degrees of freedom. This 
is an exact distribution and applies to samples of 3 or more. 
(Obviously correlation is meaningless in a sample of 2 cases because 
r could have only the values + 1 and — 1 unless both values of 
one variate were alike and then т would be indeterminate.) The 
reader should notice that while т ranges only from — 1 to +1, ¢ 
ranges from — oo to + ®. 

Formula (10.28) has already appeared as providing a test for ' 
the hypothesis 8j, = 0. But when p = 0, В, = 0 and when y: = 0, 
р = 0, so the hypotheses p = 0 and £ = 0 are really the same. 

Table XI shows for varying values of n = N — 2 the value of r 
such that the critical region for rejection of the hypothesis p = 0 
consists of all values of r numerically greater than the tabulated 
value of r. 

А second test of the hypothesis p = 0 is available when the 
sample is not small, say N — 30 or more. This test is obtained 
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from the fact that if p = 0 and N is not small the distribution of r 
is approximately normal with mean zero and standard error 
1/v/N —1. Therefore, under these circumstances 


(10.51) Уі 

has approximately the unit normal distribution. 

Warning. A formula which is widely but incorrectly used should 

be mentioned in order that the student may understand why it is 

no longer in good repute. This formula is sometimes written 
dim : l-r 

— and sometimes s, = ————- The standard error of 

VN VN =1 à 

is correctly given by the formula 


с, = 


1-p 
N-1 

But this involves the unknown р. If p = 0, this formula be- 
comes 


(10.52) с, = 


10. === 
(10.53) or Ni 
as previously stated. "There is no justification for substituting 
r for p in (10.52) to produce the incorrect formulas quoted at the 
beginning of this paragraph and to do so sometimes produces 
results that are extremely misleading. 

Another incorrect practice which is so common that it cannot 
be passed over in silence is that of computing a probable error of 
т as .6745(1 — т) /VN — 1 and attaching it to т, as for example, 
T :65 + .03. A “probable error" has meaning only in a normal 
distribution and so is obviously out of place in this connection. 
EXERCISE 10.2 

Compute both rvN —2/v1—r and rVN —1 for r=.06, 20 
and .60 and N = 5, 26, 82, 102, and 401. Examine the results to see 
whether the two formulas appear to be in closer agreement when N is 
large or N is small. Is there anything in the foregoing explanation which 
would account for this apparent tendency? Do the two formulas appear 


to agree more closely for large or for small r? Can you account for this 
tendency? 


] Tests that р is Some Value Other than Zero. When p is 
different from zero, the distribution of r is skewed and neither of 
the tests described in the previous section can be used. To obtain 
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the-data of Figures 10-3 and 10-4 samples were drawn from a 
bivariate population for which p was .58. The distribution of 
7 in samples of 5 cases as shown in Figure 10-3 is very skewed. 
For samples of 20 cases, as shown in Figure 10-4, the distribution 
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Fic. 10-3. Distribution of rin samples of 5 cases from a population 
in which р = .58. 


is markedly less skewed though still far from normal. As N in- 
creases the distribution of r becomes more nearly symmetrical 
so that for pa pone large samples with p not too near +1 or 


P as normally distributed would give fairly satis- 


-1 treating 7 


factory metui However a much better procedure is available 
which does not depend upon the dubious assumption of normality 
for r. 


Fie. 10-4. Distribution of r in samples of 20 cases 
from a population in which p = .58. 
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A variable, which is usually called z but which we shall call z, 
in order to distinguish it from the other variables which have 
already been denoted by that letter, is related to r by the formula 


E 1+7 
(10.54) 2. = 5 log. dex 
(10.55) = 1.1503 logu 1-37 
(10.56) = 1.1503 [logio (1 + 7) — logy (1 — 7)] 


This variable, introduced by R. A. Fisher, has two very great 
advantages. Its distribution is approximately normal even for 
small samples in which p is near 1, and its standard error does not 
depend on the unknown population value but only on the size 
of the sample, 
» 1 
VN -3 

If ¢ is the population value corresponding to p, then (z, — (VN — 3 
may be treated as a normally distributed variable. 

The transformation of r to z, and vice versa can be made easily 
by reference to Table XII. 

Suppose we have r = .76 for N = 228, and we wish to test the 
hypothesis that p = .80 with a = .05. A correct procedure would 
be as follows: 


(1) From Table XII we find that if r = .76, z, = 1.00 

(2) From Table XII we find that if p = .80, ¢ = 1.10 

(3) VN — 3 = 15 

(4) Then (1.00 — 1.10)15 = — 1.50 

(5) As this is a one-sided test, we read from the normal proba- 
bility curve z, = 2.5 = — 1.64. The critical region .is 
2< — 1.64. 

(6) The hypothesis that р > .80 is not rejected. 

_ We may contrast with this the decision which might have been 
incorrectly obtained by computing c, = .024 and 


(т — p)/o, = — 1.67. 


The computations are not incorrect but as we have no proba- 
bility distribution to which the statistic — 1.67 can be referred, 
they lead nowhere. If now we incorrectly refer it to the normal 
probability distribution and reject the hypothesis p = .80 because 
— 1.67 < 2, we shall make an incorrect decision. 


(10.57) ГА 
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Confidence Interval for p. The z-transformation is useful in 
obtaining a confidence interval for p. The procedure is as follows: 
(1) In Table XII find the value of z, corresponding to the 
sample value of r 
(2) Compute a confidence interval for the population value jq 
by the formula 


Zia ажа Y у 
(10.58) с(& CAMS m а 
Неге one must remember that 2, and 2, are two quite different 
values represented by the same letter only because of tradition 
and the brevity of our alphabet. 

(3) In Table XII read the value of p corresponding to each of 

the bounding values of ¢. From these form the interval 
for p. 

Thus suppose we have obtained r = .65 in a sample of 40 cases 
and we want an interval estimate for p with confidence coefficient 
.95. From Table XII we read z, = .78. From the normal proba- 
bility table we find 


2.05 = — 1.96 and 205 = 1.96. VN = 3 = V37. 


1.96 1.96 
А SS Hi — .46 « F< 1.10 
Then 78 737 <{4<.78 + Va or t 
and .43 < p< .80 


Notice that because of the skewness in the distribution of r the 
confidence limits for p are not symmetrically placed about the 
observed т, but .80 — .65 = .15 while .65 — .43 = .22 

David's Tables of the Correlation Coefficient? show the complete 
cumulative distribution for r in samples from populations for which 
pis 0, .1, .2, .3, .4, .5, .6, .7, .8, or 9 and 3 S N s 25, № = 50, 
100, 200 or 400. She also presents confidence belts for r for samples 
of the sizes listed, with a = .10, о = .05, a= .02 and a=.01. 
The chart for о = .05 has been reproduced as Chart XIV in the 
Appendix. 

Test of the Hypothesis That Two Independent Populations 
Have the Same Correlation. If a sample is drawn from each of 
two independent, normal bivariate populations the difference 
ту — тз should fluctuate around р; — pz- If each т is transformed 
to z, the difference zı — 2 will also fluctuate around fı — {ёз as mean 
and will have as standard deviation 
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| aa 
Nyse" Ni oS 


Then 
(10.59) 2 — 2 — ({1 = $2) 
К 1 1 
N55 3-4 


may be treated as a normal deviate. If the hypothesis that 
fı = & is rejected, the hypothesis p; = p» must also be rejected. 
If the hypothesis ¢ = f» is sustained, the hypothesis p; = p must 
also be sustained. 

In his research concerning Children’s Collecting Activity * Durost 
studied the collections made by 50 boys and 50 girls between the 
ages of 10 and 14. Among the correlations obtained was a correla- 
tion between mental age and the average rating of the child’s 
collections (each of the collections of each child being rated as to 
quality and the average taken for the child). For boys this 
correlation was .31 and for girls .06. Do these figures provide 
evidence that the relation between mental maturity and quality 
of collections made is higher for boys than for girls? 


r CR N-3 Waa’ 
Boys 31 3205 47 021277 
Girls 06 0601 47 021277 
2в — 2¢ = .2604 042554 = ° + о. 
.2004 _ 2604 
pul ee ар 
v.0426 206 4 


With so small a value of z, it would be inappropriate to assume 
that the relationship is higher for boys than for girls. 

Test of the Hypothesis That р, = р. When Computed for 
the Same Population. This situation is quite different from that 
of the preceding section though often confused with it. There we 
had measures on the same two variates X and Y for two different 
populations. Here we have for one population measures on three 
variates, X, Y, and Z, and we wish to know whether Z is more 
highly correlated with X than with Y, or vice versa. This ques- 
tion is particularly important when two predictors are available 
but the one showing the higher correlation with the criterion is 
more expensive, more time-consuming, or more difficult to apply. 
For example; suppose that a prognostic test, long and rather dif- 
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ficult to administer, has yielded a correlation of .56 with Freshman 
college marks, while a shorter test, less costly to administer, has 
yielded a correlation of .43, the two correlations being obtained 
from the same 100 subjects. If the tests were equally costly the 
one yielding the higher correlation would be chosen regardless of 
whether the difference in the r's was significant or not. Still 
better, both tests would be used, and prediction made from a 
multiple regression equation. However let us suppose the college 
authorities have decided to use only one test and to use the more 
costly test only if its correlation with marks is significantly higher 
than that of the other test. If the hypothesis py: = pz, proves 
tenable, they have decided to use only the shorter test. 


Let X = score on the longer test 
Y = score on the shorter test 
Z = average mark at end of first semester 


Hotelling ” has given a solution of the problem of the significance 
of the difference between r, and rz. without making any assump- 
tion as to the form of distribution of X or of Y in the population, 
but with the limitation that generalization is only to a subpopula- 
tion of all possible samples for which X and Y have exactly the 
same set of values as those in the observed sample. It is assumed 
that Z has a normal distribution for each value of X and for each 
value of Y, with common variance. 
(N = 3)(1 + Ta) 

2(1 — ry — Tre — Tye + aynaya) 
has ‘‘Student’s” distribution with N — 3 degrees of freedom. é 

Obviously it is necessary to know the correlation between the 
two tests before the question can be answered. Assume that 
Try = 52. 
Then t = (.56 — .43)V/153.10 = 1.61 

In view of this small value of ¢ it would be difficult to make a 
strong case for the use of the longer test if only one test is to be 
used. It is of course easy to show that the predictive value of 
T = .48 is too low to be of much practical use. 


(10.60) Then t= (Tz: — Tye) 


EXERCISE 10.3 

1. (a) In the problem of the preceding paragraph, test the significance 
Of ree — т. if N = 100 and rz: = .56, ту. = .48 as before but rz, = .20. 

(b) If ra = 70: 


258 - Linear Regression and Correlation 


Regression Equations in the Normal Bivariate Population. In 
the normal bivariate population each of the variates may be re- 
garded as independent and as related to the other by a linear 
regression relationship. The two equations are defined in the 
population as 


(10.61) ш: = Hy + В,„(Х – u.) and Hz.v = Us + Bzyl Y — ш). 
The corresponding standard errors of estimate are 
(10.62) Cy; = oV 1 — р? and ory=0;V1—p? 


The corresponding regression equations for the sample have this 
form 


(10.63)  T-Ycb(X-X) ad X-X-«b(-Y) 
and the corresponding standard errors are. 


(10.64) sy. = s, NA (1—5) and &,- eV - 5 (1-7) 


EXERCISE 10.4 
1. A prognostic test in mathematics was given to 98 students who were 
about to begin a course in elementary statistics. The results of this test 
Were examined in relation to scores on the final examination. The pairs 
of scores are not given below but data derived from the scores are shown. 
In the following, X is a score on the prognostic test and Y the score on the 
final examination. The sums of scores, sums of squares and cross-products 
obtained by machine computation follow: 
ZX = 3075 ZY = 4817 УХ? = 101581 XXY = 154468 ZY? = 245381 
a. Compute Y, F, bye, s, Sy, Tay, 8y.2 
b. Find the regression equation to predict Y from X 
c. What proportion of the variance of Y is independent of variation 
in X? 
d. Test the hypothesis H : p=0 
е. What Y value would you predict for a student with X = 62? 
X = 38? X = 54? 
f. What interval estimate would you make with confidence coefficient 
95 
(1) for uy.z when X = 42 (2) for В, (3) for p (4) for oy. 
2. Suppose a study has yielded data as follows: 
N=12, zr = 94, Xa'y-90  i-3 A 
Хгу = 420, ху = — 35, Z(y'? = 598, 4-5,  A,- 
a. What is the equation to estimate Y from X? 
b. What is the equation to estimate X from Y? 
€. Find a confidence interval for p with confidence coefficient .99 


uam. a pe 


————— 
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d. Test the hypothesis that р = .60 

e. Test the hypothesis that Bey = 1.25 

3. In each of the following situations there is some inconsistency, some 
reason why real data would not yield these figures unless a mistake had 
been made. Explain what is Wrong in each case. 

a. Ӯ = .96X + 13.3 and X = 1.42Y — 125 

b. Уту = 1543, Ia? = 1136, and Zy? = 1560 

с. т = .65, bye = — 24, by = — .72 

d. r = .60, bj, = .80, bz, = .90 

e. r= .60, s, = 12, Syz = 7.2 

f. r=— .20, bye = .80, bry = .50 

g =- 9X45, =- 6+4, r= 43 

h. r= .5, 8, = 4, s, = 10, bye = .25 

i, r= .5, 8, = 6, Sy = 3, ў = z, = 25y 

4. In a sample of 203 cases from population A a correiation coefficient 
of .66 has been found. In a sample of 153 cases from population B, a 
coefficient of .70 has been found. Test H: PA = pp. 

Additional formulas often useful in computations for correlation and 


regression: 


Iry = (zzv - (2x) y) 2G м, 
d . (2X)(2Y) 
= IXY = FEND 


2Zay = Уд? + Zy!— У(х— у)? 
= I(r +y) — I? - Dy? 


ia (sey (у) — B(x" — yy 222700) 


1 


2X*+ zr- x(x - yy - 209000 
N Zoy = ii (N Ex'y' — (E2')(Zy’)) 
= NEXY - (zX)(ZY) 
2NZzy = NIX? + NEY? — NX(X — Y)? - 2(zX)(3Y) 
гү? 
s, «(ng 
ar GH! 


NZy —Q(NZ(/y — (Zyy) 
= NZY:- (ZY)? 


Y.—-u)vN ГМО = 2) 
he te = (F. - m) ILIY — (Ery)? 
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(Y. - by) | /N(N —2) 
"aVi-rnV N-I 
a, = 222y = (22у) 
n T QN = 2) 
Бы __(Zay)(N — 2) 
$8, DT Ly? — (Lay)? 
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ЇЇ Other Measures of Relationship 


This chapter will take up methods of measuring relation- 
ship in situations where the assumptions specified in Chapter 10 
are not wholly met and in considering the appropriate tests of 
significance. It will also deal with certain problems related to 
the use of the product moment correlation in special situations, 

Biserial Correlation. When one of two traits is scored on a 
continuous scale and the other on a scale having only two values, 
the resulting correlation is called a biserial correlation. The scale 
with only two values is called dichotomous. This situation arises 
very often in statistical studies. A vocational counselor is con- 
cerned to know the relationship of some measured trait or traits 
to such a dichotomous trait as employment status (employed or 
unemployed), success on the job (fired or not fired), sex (male or 
female), liking for a type of work or activity (liking, disliking), 
technical training for a job (trained, untrained), and the like. A 
test writer needs to analyze the items in a test to retain those 
which have a high correlation with respect to some criterion and 
to reject those which have a low correlation. If the criterion is a 
scaled trait and the item is dichotomous, a biserial correlation is 
called for. The item scores are dichotomous if only two answers 
are possible (such as “true” or “false”), or if several possible 
answers are classified into those which are acceptable and those 
which are not acceptable. In an attempt to discover measures 
which could be used to predict success in a professional school, 
one might use “survival” as a criterion, classifying students into 
groups, those who drop out and those who complete the course. 
A biserial coefficient of correlation is used in a very wide variety of 
situations of which the foregoing are illustrative. 

Two different coefficients of biserial correlation will be dis- 
cussed, the point biserial coefficient for which we shall use the 
symbol т, and the biserial coefficient for which we shall use the 
symbol Tsi. These symbols are not standardized, fs; is in fairly 
wide use but there is no generally accepted symbol for the point 
biserial. 
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The Point Biserial Coefficient of Correlation. This coefficient 
provides a very simple solution to the problem of biserial correla- 
tion. Arbitrary numbers are assigned to the two classes of the 
dichtomous variable. The outcome will be the same no matter 
what numbers are chosen but the use of 0 and 1 holds computa- 
tional labor to a minimum and so these are commonly chosen. 
Each individual then has a pair of scores, one which is based on 
a continuous scale and one which is either 0 or 1. The correlation 
coefficient may then be computed by the formulas of Chapter 10. 
This is à product moment correlation. It can be used in combina- 
tion with other product moment coefficients in a multiple regression 
equation. It hasthesame range of values as other product moment 
r’s, from — 1 to + 1. Its significance is tested in similar fashion. 

However the general formula can be reduced to a very simple 
formula for computing the point biserial coefficient. Suppose N 
individuals have been measured, and of these N, fall in the cate- 
gory which is arbitrarily scored X = 1 while N, fall into the class 
which is arbitrarily scored X = 0. Let the means of the scaled 
trait for these two classes be Y; and Y, For the entire group 


N = Ni; + №, and mean is Y = x (М.Ү, + N Y) and 


s2 UE -Y 
LO TN 


Then the point biserial correlation is 


шз) Rr 


and will be positive if Y; is larger than Y, If Ni/N = p and 
N/N = q, the formula may be conveniently written as 


11.2 = M 
( ) т Vay Npq 


If a number of biserial coefficients are to be computed from the 
same data, the work is reduced by using a formula involving Y 
(which can be computed from the entire group and remains con- 
stant from one correlation to another) and either Y, or Y, but not 
both. If No is smaller the formula involving Y, is used; if N, is 
smaller, the formula involving Y,. 


EE 
(11.3) E E ИИА У-Ү, 7 
ae ЕУ 


aa лы 


RN‏ کے 
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or 


Ү,-Ү,/ NN жэяу, NN 
(11.4) Tob = s, V = NN = D 


No(N — 1) a 


Only very simple algebra is needed to demonstrate the equivalence 
of these formulas and their derivation from the more general 
formula for the product moment coefficient. 


If N is very large so that d N 1 may be treated as 1, the formula 


reduces to any of the following forms: 


(11.5) т» = 1 T Урд 
Түгү fpi ҮЧҮ oM. 
(11.6) ЖОЛЫ, TE i ve 


(11.7) „7-7 2_->: 


The use of point biserial correlation may be illustrated by data 
gathered by Spaney." A battery of tests was administered to 
308 student nurses shortly after their admission to nursing school. 
Among the measures obtained was a rating of the student nurses 
on the trait steadiness-emotionality, and these ratings may be 
treated as a continuous scale. Six months later, at the end of 
the preclinical period, 29 students had withdrawn. The mean 
steadiness-emotionality rating for the 29 withdrawals was 


Yo = 35.07 


and for the 279 who stayed on it was Y; = 30.00. For the entire 
group of 308 the standard deviation was s, = 11.62. Hence the 
point biserial coefficient was 
roy = 4/ 29279) 3000-3507 __ 13 
(308) (307). 11.62 

Since the scale is so arranged that a high score indicates steadiness 
and a low score emotionality, the correlation suggests. a slight 
tendency for those staying on to be, in general, less steady, more 
emotional than those withdrawing. The correlation is signifi- 
cantly different from zero according to the test discussed in a 
subsequent section, but the correlation is so low that rating could 


not be used to furnish any useful prediction as to whether an 
individual student is or is not likely to withdraw. 


Table 11.1 presents the kind of data commonly assembled by 
а test technician making an item analysis, except that he would 
have more subjects and more items, Bince Y = 434 for every 

TAME ILI Poit Бонна ond Бег! Correlation Сое ків Obtomed 
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for 20 Бај on soch of 5 tems 


F = 4240 
2)" = 0 
$r- 1904 


(05/20 = 270711 


<a 


I.e 


- 


Criterion Dm 1 Im 2 Пота J lend Инв 


1/235252222:32222:22:2322 


"""-"9-"e2929822929»928 


эе 


33 
27983 


28 


~2337? 


:*833 


gts 
gas 


it unnecessary to 
test technician might 
smaller number of cases, For 
of point biserial would in this case be 
( dae: 


(11.3) would make 


we 

both У, and Tı, so the experienced 
only the one bad on the 
the computation 


of Formula 
Pas 47:0 ~ 434 
а Ју 
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Test of Bignificance for Point Diseria! Correlation. 1) will be 
aroused that is the population the distribution of Y scare van ber 
wih T= ond cok иф T came imd 
with X = 0 and one for the Y starse paled with X L 
two curves have the same standard deviation e, v = py, whare 
Pn is the point biserial coefficient for the population. The mesa tor 


the V sooren paired with X = O be ж = py + pra PO = Y) and the 


mean of У scores paired with X = 1 la ж = ж, + Aa PH = X). 


(1 
These are the same assumptions of normality of 
and homogeneity of variance which were made in Chapter 7 for 
testing the hypothesis that the means of two populations are equal. 
To test the hypothesis that py = 0, the statistie 


(в) 1 YÊ may be шм as in Chapter 10. 
PAS 


Tt has “Student's” distribution with N — 2 degrees of freedom. 
M py = 0, then ө, = pom и, bd the tost that py O be ales the 
fest that the two normal distributions have the same mean, If 
any of the formulas for r, is substituted in (11.8) the latter ean 


be reduced identically to the familiar formula бли. ia 
бт engage nasi оа 
tions with equal variance: 

„1-1. 7 
(10) „= - 


1+, = + E 
Vor the correlation obtained for item 8 of Table 11.1, Formula 
(OLN) would give 


ae Hag im 


while Formula (11.9) would give 


"onam a t 


Confibesre imite for pa таз be obtained by an eppleration af 
the non contral i ditribution. This be the distribution of (ben 
the mean of t, БО), is mot equal lo sere. "Poudent a" ditribution. 
may be regarded as central sinte И le the divtributisn ef 1 mim 
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E(t) = 0. Applications of non-central ¢ including tables for its 
use have been given by Johnson and Welch (see reference 7). 


VN — 2 


POS ; 
When pp = 0 the mean of t = —— — ^ Ton is zero, hence ¢ has 
"Student's" distribution. When p,» 0 the mean of t = rev N-2 


1-75, 
is not zero but depends on pp, hence the distribution is non-central. 
For each value of pp», t has a separate distribution. If a value of r 
has been observed the set of non-central distributions make pos- 
sible the calculation of confidence limits. (For a fuller discussion, 
see reference 12.) 

When the Johnson-Welch tables are not available, and when 
N is not less than 20, an approximation using tables of normal 
probability will give fairly satisfactory results. Suppose an in- 
terval with confidence coefficient 1 — o is sought. 


_ VN – 2 


First compute i = 
р Vi =F 


Then compute 


2 
(11.10) а t+ eV э +1 
(11.11) and ihe а qt! 


Then the required interval is 


d а 
(11.12 = шет Д 
) Утта SNT 


The reader should be warned that the approximation becomes 
inaccurate if confidence coefficients of less than .95 are used. How- 
ever the Johnson-Welch tables make it possible to obtain confi- 
dence intervals at a variety of levels. 

From the point biserial correlation of — .13 obtained for 308 
student nurses, previously described, we shall now make an 
interval estimate for Ppob. 


BA ‚л, 
VA - (.13)2 1 
Since ¢ has "Student's" distribution with 306 degrees of freedom 
when py» = 0, it is clear that the hypothesis р = 0 may be rejected. 


е, — 
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To calculate confidence limits with 1 — = -95 we have 


ae [ (270.20) 
dy = — 2.29 – 1.964/1 + “as 7-428 


Bea Tey " (22.29) — 
d = = 2.29 + 1.964/1 + C22 32 
- 4.26 -.32 
'Then TA = < i 
V308 + (- 4.26)? ^ ° V308 + (— 32) 
or = 236< p 018 E 


The Biserial Coefficient of Correlation. A different approach 
to biserial correlation was made by Karl Pearson (reference 15). 
His method depends on the assumption that the dichotomous 
variable is actually continuous. Call this variable X and assume 
further with Pearson that the distribution of X is unit normal, 
that the regression of the continuous variable Y, on X is linear 
and that for each X the variance of Y is the same as for every 
other X. A further assumption will be made below that the joint 
distribution of X and Y is normal bivariate. 

Suppose that in the dichotomous variable a proportion p of 
the eases in the sample are in one class, and a proportion q-1-p 
are in the other. The corresponding situation on the continuous 
variable X is represented by locating a point A on the X scale 
such that an ordinate drawn at A divides the hypothetical normal 
distribution of the dichotomous variable into two areas with 
proportions p and:g. We suppose that each individual in the 
sample has a value on the X scale, but that the only information 
available about an X. value is that it is greater than or less than A. 

To illustrate this situation, suppose that we wish to compute 
the relationship between response to an objective test item and 
the total score on the test. For each person we know whether he 
answered the item correctly or incorrectly, and know his total 
test score. Under the assumptions made above, the person's 
actual knowledge about the subject matter tested by the item 
has some value on the scale of the continuous variable X, but the 
only information available about his knowledge is that it exceeds, 
or is less than, the value of X at some point A on the X scale. 
"That point can be determined by the proportion of persons who 
answer the item correctly and the tables of the normal eurve. In 
actual practice the score corresponding to A is not needed, as will 
be shown in the following discussion. 
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In practice, we find the coordinates of two points on the line 
of regression of Y on X and use this information to estimate the 
correlation between the variables. One of these points has co- 
ordinates X, and Yi, which are the means for the two variables 
of those persons who answered the item incorrectly. Let q be 
the proportion апа gN the number of such persons. Then q is 
represented by the area under the normal curve to the left of A. 
The X mean of that part of the normal distribution is given by 
the formula 


u 
(11.13) Х, = Т 


where v is the ordinate * of the unit normal probability curve at 
the point A. This ordinate can be read from Table III if the table 
is entered with the area between ordinates A andatthe mean. The 
value of Y, is computed directly from the Y scores of the gN per- 
sons in this subgroup. : 

The second point has coordinates Y; and Ys. X, is the mean 
of that portion of the unit normal eurve to the right of A. This 
portion has area p and mean 


(11.14) See 5 


The value of Y; is computed directly from the scores of the pN 
persons who answered the item correctly, 
The two points which determine the regression line are there- 


fore the points (- 7 r) and б Ys): The line passing through 
these two points is the regression line and its slope is 

Ys — TY, 
(11.15) ae a = (7.-7) 2 

p q 
The slope of the regression line is also given by the expression 
(11.16) ba = nat 

Since the distribution of X has been taken tó be unit normal 


Sz = 1 апа sob = азу. Equating the two expressions for the 
slope we have 


(Y, = Y) Pa = Poissy 


* Most texts use the letter z for this ordinate, but we have already used z to repre- 
sent an abscissa of a normal curve, 
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Therefore we have the formula 


(11.17) nis A ра 
8, 
Under the assumption that the joint distribution of X and Y is 
normal bivariate ть, is an estimate of р. It can be regarded as an 
appropriate estimate only when the sample is large. 
We can also write fsi in the form 


(11.18) nue EE 


For item 3 of Table 11.1, p = .25 and u = .3177 


EN 47.0—42.2. (25)(75) _ 89 
3 9.17 3177 j 

An interesting — and disconcerting — aspect of biserial r is 
that its value may be numerically larger than 1. In fact, the 
range of its values is unbounded in either direction. 

Sampling Theory of the Biserial Correlation Coefficient. The 
exact sampling distribution for ть» is not known. Karl Pearson 
derived a formula for its standard error, but that formula involves 
the unknown parameter р and substitution of rj; for p is not 
satisfactory. 

An extensive study of biserial coefficients has recently been 
completed by R. Е. Tate.” One valuable outcome of his study 
is a transformation for r»; similar in purpose to Fisher's logarithmic 
transformation of r to z in Table XII. When the dichotomous 
cut is so made that in the population P = Q = 0.5 (approximately) 
and when N is large, and ть is not close to — 1 ог + 1, Tate shows 
that a quantity z* (read “z star") has a normal distribution with 


standard error 


(11.19) се = 7 


The relation of z* to rj; is 

(11.20) z* = 1.0297 {logic (1 + .8944rsis) — login (1 — .89447,,)} 
The corresponding formula for the population is 

(11.21) t* = 1.0297 (logi; (1 + .8944p) — logis (1 — .8944p)} 


Table XII giving the transformation of т to 2 may be used instead 
of making the direct computation from logarithms. To utilize 
this table we use the following transformations: 
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(11.22) т = .8944ть»„ or Tois = 1.118r 
(11.28) — z = ġ{log. (1 + r) — log. (1 — 7)] 
(11.24) 2* = .89442 ог z = 1.118z* 


Procedure for obtaining an interval estimate for р from ть: 

(1) Multiply rs: by .8944. 

(2) -Read z corresponding to .8944r,;, from the table. 

(3) Multiply this value by .8944 to obtain z*. 

(4) Make whatever test of hypothesis or interval estimate is 
required for {*. 

(5) If an interval estimate has been made for {* and the cor- 
responding interval is required for p, multiply the upper 
and lower limits of the obtained interval by .8944 for 
values with which to reenter the table. 

(6) Using the values obtained in step (5) read the corre- 
sponding values of r from the table. 

(7) Multiply these values of r by 1.118 to obtain the limits 
for p. 


Example. Given"m = .72 when N» = 163 and N, = 185. Find an 
interval estimate for p with .95 confidence coefficient. Since 


p = 185/348 = .5316, 
it may be assumed that P is near enough to 0.5 to justify using the method. 


(1) .8944ny, = .6440 

(2) If r = 6440, z = .765 

(3) 2* = .8944(.765) = .684 

(4) o = 1/348 = .0536 
684 — 1.96(.0536) < t* < .684 + 1.96(.0536) 
579 < t* < .789 

(5) 1.118(.579) = .647 
1.118(.789) = .882, 

(6) If z = .647, т = .570 
If z = .882, r = .707 

(7) 1.118(.570) = .637 
1.118(.707) = .790 
Then .637 < р < .790 


Another aspect of the statistic rj4 which Tate investigated is its 
efficiency or the degree to which rj; approximates p. The approxi- 
mation is good when p = 0 and increasingly poorer as it approaches 
-1 or + 1. The approximation is good when P = .5 and in- 
creasingly poorer as P is removed from 0.5. 
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Comparison of Biserial (ri; and Point (ть) Coefficients. 
The following general comment is taken from Tate": “It 
would seem from the evidence presented that point biserial r is 
in most cases the better coefficient to use." "The two coefficients 
may now be compared on several points. 

a. The statistical model. For r,;, normal bivariate universe is 
assumed. If this is the case, then the distribution of Y cannot be 
normal within the separate categories. For rj, no assumption is 
made as to the distribution of the dichotomous variable, but gener- 
alization is made only to a universe of samples of size N having 
the same fixed number of cases No and N, in the dichotomous cate- 
gories. For r,» it is assumed that Y is normally distributed within 
each X eategory and that the two Y distributions have the same 
variance. The validity of this assumption can be ascertained by 
testing the two distributions for equality of variance and testing, 
possibly only by inspection, the normality of each distribution. 

b. Range of values of r. For point biserial the range is from 
—1 to +1. For biserial r the range is unlimited. 

c. Sampling distribution and tests of significance. The exact 
sampling distribution of biserial r is unknown and significance 
can be tested only if the transformation to 2* can be made. This 
is defensible only when P is near 0.5, N is large and рь, is not near 
1 or —1. The exact sampling distribution of point biserial is 
known and tests of significance and confidence intervals can be 
obtained for any value of p and any size of N. 

d. Use with other r's in a regression equation. Yor biserial r 
such use is very dubious, for point biserial legitimate. 

e. Position of dichotomous cut. For either coefficient, better 
results are obtained when No = N, than when they differ in size. 

Fourfold Correlation. A measure of relationship may be 
needed in a problem where both traits are scored dichotomously. 
The chi-square test may be used in this situation to test for com- 
plete lack of relationship between the two traits, as was done in 
Chapter 4. However when the chi-square test indicates a signif- 
icant relationship it does not itself provide a measure of the 
strength of that relationship. 

In Spaney's study of the prediction of success of students in 
schools of nursing," students entering a school of nursing were 
asked to mark items in a personality inventory and their responses 
were studied in relation to their subsequent completion of the 
course or withdrawal before the end of the course. Responses to 
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the item “Do you like performing on the radio" may be used as 
an illustration. 


Not 


Withdrawing ud wing Total 
Some liking for performing on radio 29 126 155 
No liking for performing on radio “9 15 24 
Total 38 141 179 
х? = 4.38 Vx? = 2.09 


The chi-square test indicates that at the .05 level we must 
reject the hypothesis of independence between liking for perform- 
ing on the radio and remaining in a school of nursing. If then the 
traits are not to be assumed independent, how strong is the 
relationship between them? 

Two methods of correlating dichotomously scored traits will 
be discussed. These methods are parallel to the two biserial 
coefficients. 

The Phi Coefficient. As for the dichotomous trait in the point 
biserial coefficient, arbitrary values may be assigned to the two 
categories into which the X-seale is divided and also to the two 
categories into which the Y-scale is divided. For this purpose 
any numbers whatever may be used and the final outcome will 
be the same, but the work is minimized by using 0 and 1, as was 
done for the point biserial r. Let a, b, с and d represent fre- 
quencies, as in the formula for chi-square. The application of 


0 1 
1 a+b 
b+d a+c 


the product moment formula gives the correlation between the 
two traits as 


Ja ad — be 

Vla + b) (c + 4)(а + с)(б + d) 
Thus for the Spaney data, ф = .157. 
(11.26) = Np? 


where x* is the statistic already given by Formula (4.7). Hence 
the significance of ¢ is tested by referring Né? to a chi-square 


(11.25) 
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table with 1 degree of freedom or by referring VN¢? = ФУЛ to 
a table of normal probability. In other words, if the null hy- 
pothesis for x? is justified, the null hypothesis for $ is justified 
and vice versa. 

No way of finding confidence limits for ф is known. 

Yates’ correction should be applied to the numerator of ¢ 
whenever it is needed for the numerator of x?. 

In order to have the positive value of $ indicate a positive 
relationship, the table should be so set up that a and d represent 
the frequencies of individuals who possess both traits or neither 
trait while b and c represent the frequencies of individuals who 
possess one trait and not the other. (If a and d are used to repre- 
sent frequencies of individuals possessing one trait and not the 
other, then the numerator of ¢ should be written bc — ad.) 

The ¢ coefficient is a product moment correlation. 

In making item analyses it is often convenient to divide the 
criterion group into two equal parts, so that a +c =b + d = àN. 
This arrangement reduces the formula for x* and for ф to very 
simple special formulas which permit considerable saving of time 
when each of many dichotomous items is to be related to the same 
dichotomous criterion. Under these cireumstances 


a—b ас 
(11.27) =" Tatberd VGC 
N (a — b)? NADA 


(128) and X= ++” FE +) 


where c + d is the number of individuals failing the item, a + b 
is the number passing the item, a is the number in the upper cri- 
terion group who pass the item and b is the number in the lower 
criterion group who pass the item, d is the number in the lower 
criterion group who fail the item, and с is the number in the up- 
per criterion groups who fail the item. 

Tetrachoric Correlation. This coefficient is developed on the 
assumption that both variables are continuous and that the joint 
distribution of the variables is normal bivariate. From this point 
of view the fourfold may be regarded as a scatter diagram divided 
into four quadrants by lines parallel to the coordinate axes. The 
specific X and Y values of the observations are unknown, all that 
is known is the frequency, or number of observations, in each of 
the four quadrants. From these frequencies the value of p in the 
normal bivariate population is estimated. 
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In his original paper on the tetrachorie coefficient of correla- 
tion Karl Pearson !* derived some extremely complex formulas 
for computation of this coefficient. These formulas will not be 
reproduced here. However, some approximations also derived by 
Pearson will be presented. 

When both variables are split approximately at the median, 
that is when a +b = c +d and a +c = b + d approximately, then 
т, the tetrachoric coefficient is given approximately by the formula 


(11.29) r= sin {90° er 


Formula (11.29) is exact when the relations a + b = c + d and 
‘a + c= b + d hold exactly. 

Under other circumstances an approximation to r, is obtainable 
from the formula 


3 мс 
(11.80) T, = COS {180 vars al 

A series of very useful diagrams from which the value of tetra- 
choric r could be read by graphical interpolation were published 
by Chesire, Saffir and Thurstone but are now out of print. 

Comparison of Phi and Tetrachoric r. Since phi and tetra- 
chorie т are both used to compute relationships in a fourfold, or 
doubly dichotomous, distribution some discussion of the relative 
merits of the two methods is called for. 

The phi coefficient is preferred: 

1. Because phi can easily be computed for all distributions. 
Tetrachoric r has meaning only for large samples. It is difficult 
a+b 

N 


to compute r, when one of the marginal proportions such as 


is small. 

2. Because the phi coefficient readily provides a test for the 
hypothesis of independence through x?. Use of r, for this purpose 
presents almost insurmountable difficulties. 


3. When the traits are logically dichotomous rather than 
continuous. 


The tetrachorie coefficient is preferred: 
1. When the traits involved in the relationship may logically 
be assumed to be continuous. 


2. Because tetrachoric r provides an estimate of p. Phi is 
not an estimate of a parameter. 
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3. Because tetrachoric r ranges from = 1 to + 1 regardless of 
the marginal frequencies. On the other hand, the values of phi 
are restricted by the marginal frequencies. Consider, for example, 
the fourfold distribution with marginal frequencies 100, 100, 
160, 40. For this fourfold, т, ean take values from —1 to +1, 
whereas phi is restricted to the range — 3 to + 4. A situation such 
as that indicated in the fourfold may arise in correlating responses 
to an item with test scores using high-low halves. The range of 
phi is always restricted when the ratio of marginal frequencies 
for one variable differs from this ratio in the other variable. 

Estimating Correlation from the Tails of the Distribution. 
A time-saving method devised by Flanagan * is useful for estimat- 
ing relationship when one variable is dichotomous and the other 
continuous. Such a situation arises in correlating a test item 
with scores on a test. The scores on the test are arranged in 
order of size and the upper and lower 27% are selected for further 
computation. The middle 46% are put aside. From the propor- 
tion of correct responses in each of the high and low 2796 groups 
a measure of relationship is obtained. 

The coefficient calculated in this way is an estimate of p in a 
bivariate normal population. The dichotomous trait is assumed 
to be actually continuous. However, the only information avail- 
able is that given by the frequencies in the four corners of the 
hypothetical scatter diagram. The estimate of p obtained in this 
way has been shown by Kelley ° and by Mosteller," to be more 
efficient than tetrachoric т, at least when p is zero. Though 
biserial r can be used in this situation Flanagan's method is very 
much faster. 

Table XIII (prepared by Flanagan) provides a ready means of 
computing the correlation coefficient when the percentages in the 
upper and lower 27% groups are known. In analyzing test items 
it is convenient, when possible, to select a sample of N — 370 cases 
because 27% of 370 = 100. The upper 100 and lower 100 of the 
papers in the sample are then used for further computation. For 
each item the per cents correct in the two selected groups are 
used for estimation of p. 

In Chapter 16 a method analogous to the one just described 
is used for estimating p when both variables are continuous. 
Though this method is less efficient than product moment r it is 
much faster to use; so it is sometimes advisable to use a larger 
sample and an inefficient method of estimation. 
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The Correlation Ratio. The correlation ratio is a measure of 
relationship which is useful in two circumstances: 

1, When both variables are continuous but the regression is 
not linear. This situation is illustrated in the relationship be- 
tween age and IQ in Table 11.2, page 279. 

2. When one variable is continuous and the other is discrete. 
Problems of this sort were considered in Chapter 9 under the head- 
ing of analysis of variance. 

The development of a formula for the correlation ratio is 
similar to its development for the product moment coefficient r 
in Chapter 10. By simple algebra the following relationship can 
be derived from Formula (10.4) 

zY-fT) , z(v-?y 
а е тару Vy 


In this formula Ў is given by а linear regression equation. When 
linear regression cannot be assumed the estimate Р is no longer - 
applicable, Instead, the mean of a column of the scatter diagram 
is the best estimate of the scores in that column. Call the mean 
of the jth-column У, and replace Ӯ in Formula (11.31) by Yp 
Designating the correlation ratio of Y on X by E,,, we then obtain 
the formula 
A 


Yo. - Y» 
(11.32) ES а 1 1-1-1 
p (Yia ~ У)? 


1..1 
If both variables are continuous, there are two correlation 
ratios which in general are not equal. By analogy the correlation 
ratio of X on Y is Es, where 


L 
(X, - Xi 
Ur eli HT 
iie» 
Here Y, is the mean of the jth row, and k is the number of rows, 
which is not necessarily the same as the number of columns. 
From the definition of the correlation ratio the following char- 
acteristics of this coefficient can be determined. 


1. If the means of all columns are equal to each other, and 
therefore to the general mean, E’, = 1 — 1 = 0. 
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2, If the means differ greatly from each other and the obser- 
vations within a column are very close to the mean of the column 
E^,, ів close to one. 

3. The square root of E*,, is always taken positively, therefore 
E,, ranges from sero to one. 

An alternative formula for E*,, is obtainable through the sub- 
division of sum of squares as in analysis of variance. 


(11.34) i i (У, - Р) » Yo. -Yy D - У) 


Appropriate substitution in Formula (11.32) leads to the formula 
for E's 


(11.35) p. E 


From Formula (11.35) it can be seen that E’, is the ratio which 
the sum of squares of column means around the total mean bears 
to the total sum of squares of scores around the total mean, A 
convenient formula for computing £',, is 


„т? T. 

ee 

(11.36) „гуы 
where r,-YXr. and 7,» in 


If data are grouped in the form of a scatter diagram and scores 
are coded, Formula (11.35) is modified by use of the following 


notation: 
Ny is the frequency in a cell in the ith row and jth column. 
N, is the total frequency in the ith row. 
М, is the total frequency in the jth column, 
N is the total number of eases in the sample. 
г debe Y scores as explained in Chapter 10, 


~ 3 мау is the sum’ of z' scores in the ith row, each 
multiplied by its appropriate frequency. 

T} -Ўмаг is the sum of the у’ scores in the jth column, 

each multiplied by the appropriate frequency, and À is the number 


of rows. 


T= XTi T, = ZT 
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Then we have the following formulas 
SE qu 
N; N 


(11.37) Е. = тоу 
ХМ (у)? – ER 
and. 
х (PY | E 
(11.88) EY, = : 


ZN;(zy- с, à 


Computation of E*,, is illustrated in Table 11.2 which shows 
the joint frequency distribution for age aud intelligence quotient 
of 109 pupils in a fourth grade. The arbitrary origin for age has 
been placed at 96 months, with step intervals of 5 months. The 
arbitrary origin for intelligence quotient has been placed at 66 with 
step interval of 7. 

The population values of the correlation ratios are indicated 
by the symbols 7. and nay. (т is the small Greek letter eta.) A 
test for the hypothesis n,- = 0 is given by the F ratio 


ЮАМ Б 
(11.39) Е = IERE ER-I 
with m=k-1 and m=N-k degrees of freedom. This is 
identical with the F test for the hypothesis ш = Ш = +++ = ць 


described in Chapter 9. This is logical, for ny = 0 if the column 
means are equal in the population. 

The test for linearity of regression is the test that n, = p and 
is made by computing the ratio 

Е? т N-k 
Ti: у= E A tre. 
(11.40) F bom рс 
with degrees of freedom ^; -k—2andm = N —k. This test is de- 
rived from the analysis of the total variance into three independent 
components shown as follows together with the degrees of freedom. 
(11.41) 82 = ois 35 (E^, у Tey + (1 E E?, 2) 8y? 
N-1-1 +(k-1-1) + (N —k) 

Rearranging the order of columns has no effect on Eyz, but of 
course may change r strikingly. Similarly rearranging the order 
of rows has no effect on Ey. l 1 
А Correlation Among Ranks. There are two situations in which 
it may be expedient to work with ranks rather than scores and to 
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TABLE 11.2 Computation of the Correlation Ratio of Age 
on Intelligence Quotient for 109 Fourth Grade Children 


Age to Nearest Month 
EEEEEEEEEERT: 
1 

isiíiiiaiiiii, Vo TC (TON 
147-153 Г T TT si tt лела 1 1.00 
140-146 | alt] doge Т fall a DE 9 20.25 
э 133-139 ИШЕ ЕДЕ А ПУ doo in 10 25.00 
2 126-132 TAT T] 171 T 0 9 25 62.50 
$ 119-125 ДИЙ ШШЕ ШЕЕ ШЫ Ыыы 
@ 112-118 | 1/4/5/5|1| || |_| 16 7 49 150.06 
8 105-111 BEHELUNHNEENNN 15 6 51 173.40 
& 98-104 3/5112 | үү 15) B — 69 317.40 
3 99-9 || 1|2]2]1]1]1|T ] 9 4 42 196.00 
3 м-т PAISES ШШ BART LEE ag ee SRR 
Шы ШШЕ ЕШ ЖЕЗДЕ gu modu 228.17 
70- 76 ҮЧҮН 1 12 144.00 
63-69; | | | | 1 1 1 1 HEIDI 3 0 30 300.00 
SIN* yA ASP оюнан m 109 413 1878.78 

ъ= есе жо оле шуо ere 


109 _ (1878.78)(109) — (413)? 


= 668 


(413)? 
2035 — “109 


Е, = V668 = 817 


Tr = — .723 


(2035) (109) — (413)? 


——— Ea he BACB май. 
seek a measure of relationship among ranks. The first situation 
arises when there is no satisfactory device for scoring the trait in 
question but the individuals can be placed in a rank order in re- 
spect to the degree of the trait they exhibit. In this situation it is 
possible to say for any two individuals which one is higher on the 
scale for this trait but not possible to say how much higher. The 
reader will readily think of many situations of this kind, as e.g. 
the attempt to place in order of merit the performances of a num- 
ber of contestants for an award in an area where the criteria 
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cannot easily be reduced to quantitative terms, or to place in- 
dividuals in order in respect to their possession of some intangible 
such as, “attractiveness,” “sense of responsibility," ''courtesy,". 
and the like. If the group is large, determining the order of 
individuals becomes very difficult, so the rank order correlation 
is obviously more useful for small samples. 

A second situation arises when traits which can be measured 
on a scale are recorded as ranks instead. The chief occasion for 
doing this is when the distribution of scores is obviously not normal 
and a measure of relationship is sought which does not depend for 
its validity upon the assumption of a normal bivariate universe. 
The advantage of the rank order correlation coefficient in this 
situation is worthy of considerable attention. 

Nearly all the tests of significance commonly used are derived 
on the assumption that sampling is from a normal universe. In 
the case of the mean, those tests are still valid even when the 
universe is far from normal for the distribution of the mean rapidly 
becomes normal as N increases (except for a very special case 
which the theoretical statistician likes to talk about but which 
is not at all likely to be encountered in practice). The sampling 
distribution of the variance and of the correlation coefficient are 
seriously disturbed by lack of normality. If a population is bi- 
1 - م‎ 
VN 
bivariate normal that standard error may .be quite different, but 
how much different is usually unknown. 

The formula for rank order correlation, given originally by 
Spearman !5 is 

Й 62d? 
(11.42) RETS NNT 
where N is the number of individuals ranked and dis the difference 
in the ranks assigned to the same individual. In the computation, 
a useful check is provided by the fact that Zd = 0. This formula 
is derived by applying the usual product moment formula to the 
ranks (see reference 6). 

Suppose 12 cakes submitted in a competition at a country fair 
have been ranked by two judges with results as shown in Table 
11.3. Then i 


variate normal, the standard error of ris o, = but if it is not 


R-1.. 940 — 123 


12044 —1) ~ 143 ^ 56 
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TABLE 11.3 Computation of Coefficient of Correlation Among Ranks 
Assigned to 12 Cakes 


Rank Assigned by 

Cake Judge I Judge II a Ф 
А 7 6 1 1 
B 8 4 4 16 
С 2 1 1 1 
р 1 3 -2 4 
E 9 11 -2 4 
F 3 2 1 1 
G 12 12 0 0 
H 1 10 1 1 
I 4 5 -1 1 
J 10 9 1 1 
K 6 7 -1 1 
L 5 8 -3 9 
Ië = 40 

6(40) 20 123. 


R-1-15043)7]- 3^7 143 


Now let the ranks assigned by Judge I be denoted X and those 
assigned by Judge II be denoted Y. Then 


ZX-78 ZY-78 
EX? = 650 ЖҮ? = 650 
А 78 78 
Zat = 650-18 = 148 Zy = 650 - 77 = М3 
ZXY = 630 Sry = 630 - шы - 123 
and T- Шу лесу. = раз = 
/(143)(148) 143 ^ 


This computation illustrated the fact that the rank order coeffi- 
cient and the product moment coefficient applied to the ranks are 
identical. However, if scores are transformed to ranks — a pro- 
cedure which Hotelling calls uniformizing the distribution — the 
product moment correlation of the ranks (which is the rank order 
coefficient) is almost certain to be different from the product 
moment coefficient of the original scores **. 

Test of Significance for Rank Order Coefficient. For very small 
samples from a bivariate population in which the variables X and 
Y are uncorrelated so that pzy = 0, the exact distribution of the 
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rank order coefficient can be obtained by direct enumeration of 
the number of permutations of the ranks which would produce 
each possible value of the correlation. The distribution of R is 
diserete and is bimodal in small samples. A table obtained in this 
way, giving the probability distribution, not for R, but for Zd? has 
been computed by Kendall with N € 8. Obviously, the work 
of direct enumeration, while possible, becomes very laborious for 
larger values of N. (For N =8 the number of permutations is 
8! = 40320. For N = 9 it would be 9! = 362880.) Using this table 
the values of E required for significance at a given level, from 
samples of a given size, may be computed. Such values are given 
in Table XVI in the Appendix. 

Comparison of the entries in Table X VI with the corresponding 
entries in Table XI (where N = п + 1) shows (at least for these 
small values of N) that the rank order correlation must be larger 
than the product moment correlation to achieve the same level 
of significance. 

For larger samples, an approximation is needed. If p = 0, 
the standard error is 


1 
11.43 = نب‎ 
( ) oR No 1 
and the sampling distribution of R approaches the normal form 
as N increases. If N is as large as 25, 


(11.44) 2= RVN-1 


may be safely referred to tables of normal probability. How to 
deal with samples larger than 8 and smaller than 25 is still a prob- 
lem. For this situation the coefficient + mentioned later in the 
chapter may be used. How to test hypotheses other than p=0 
is also not known. 

Estimation of Product Moment Coefficient from Rank Order 
Coefficient. The relation between the product moment correlation 
and the rank order coefficient was derived by Karl Pearson and 
later confirmed by Hotelling.? It is 


(11.45) 7 - 2 sin gz 
Then 
(11.46) s; т? 


~ O(N — 1) 
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is the variance of 7. Since the sampling variance of the product 


А 1 s At 
moment is g, = Net’? when p = 0, the variance of 7 is 


2 
(11.47) т? = " a, = 1.0970, 


The meaning of this formula is that if in a sample of 100 cases from 
a normal bivariate population, the product moment coefficient is 
significant, it will require a sample of 110 cases to achieve the same 
degree of significance for the rank order coefficient. Hence the 
ae 

MOGI е 

Formula (11.45) has often been used to “convert the rank order 
coefficient to a product moment coefficient." Unfortunately, it 
cannot perform that magic and its usefulness is somewhat dubious. 
A continuous variate is really not susceptible to ranking. If 
scores are available, the procedure of transforming them to ranks, 
computing the coefficient of rank order correlation, and then 
applying Formula (11.45) to attempt to estimate what the product 
moment coefficient might have been had it been computed, has 
nothing whatever to recommend it. Usually it does not even 
save time. 

Relation Among Ranks Given by Several Judges. Suppose 
that 8 individuals have been ranked by 4 different judges who may 
be called Р, Q, R, and S asin Table 11.4. A single measure of the 


rank order coefficient has efficiency 


TABLE 11.4 Computation of the Coefficient of Concordance from Ranks 
Assigned to 8 Subjects by 4 Judges 


А 


Subject Rank Given iy Giren ' ү Square Judges R 

A 143 3 п 121 P,Q 3% = .762 
В 2.1451 8 64 Р, R R= .714 
C 8-322386 14 196 Р, 8 B= .738 
р 4271 2 9 81 Q, R 3 = .738 
Е B. eee 22 484 0, 8 3$ = .810 
Е 626). 5 21 441 R, 8 33 = .610 
G T 8 57 28 784 Sum 1M = 4381 
H 8788 31 _961 , Mean H= .730 

144 3132 r 

144) 12540) _ 45 _ 
8 = 3132 — 09 - W = EO = 557 8и 


R= AGH} - 70 


——-——————— ا 
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general agreement among all four judges is desired. Probably it 
would occur to most persons that the correlation of each judge 
with every other judge might be found and the average of all such 
coefficients taken. This has been done for the four judges of 
Table 11.4, with results recorded at the right of that table. The 
mean of the six rank order coefficients is Ё = .730. If there are 
m judges the number of the coefficients which must be computed 
in order to find Р is m(m — 1)/2. 

Kendall has proposed a coefficient of concordance among rank- 
ings and has obtained an approximation to its sampling distribu- 
tion", This coefficient of concordance has a linear relation 
to R. 

The steps in the computation of the coefficient of concordance 
are as follows: 

1. Find the sum оѓ the ranks given by the m judges to each 
subject. In Table 11.4 these sums are in column 6. 

2. Verify that the sum of these sums-of-ranks is mN (N + 1)/2. 
In Table 11.4 this is 4(8)(9)/2 = 144. 

3. Find the mean of these sums-of-ranks. Here this is 18. 

4. Obtain the sum of the squares of the deviations of the m 
sums-of-rank around their mean and call it S. For the data under 
consideration, the deviations of the entries in the sixth column 
from 18 which is their mean, are respectively 


— 7, — 10, — 4, — 9, 4, 8, 10, 13 
and the sum of the squares of these eight deviations is 540. 


Alternatively, 3132 – ам = 540. 


If agreement among the ranks were perfect, the value of S would Бе 
2 ; E 1 
m. The student can verify this by writing down & 
set of ranks, duplicating it several times, and computing S. The 
measure W is defined as the ratio of the observed S to the value S 
would have if there were perfect agreement among the several 

rankings: 
128 

1. yee EN 
(11.48) W mN(N? = 1) 


is the coefficient of concordance. It may have values ranging from 
0 to 1 but it cannot be negative. For the data under discussion, 
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W = $$ = .804. Then R, the average rank order coefficient among 
all possible pairings of the judges, is related to W by the formula 


>_mW-1 
(11.49) RT 
For these data R = sd -1 - 3 = .730 which agrees perfectly 


with the value obtained previously as mean: of six computed co- 
efficients. 

For small values of m (the number of judges) and N (the num- 
ber of subjects judged) the exact probability distribution of 8 
can be obtained by enumerating all possible permutations of the 
N ranks for the m judges. Kendall has done this and furnishes a 
table (page 412 of his Volume I) showing the probability that a 
given S will be attained or exceeded for N =3 and m = 2, 3, 
... 10. He also gives a table for N = 4, m = 2, 3, 4, 5, or 6, 
and for N = 5, т = 3. For larger values of N and m the Р test 
may be used: 
(11.50) F= e 


with degrees of freedom m = N — 1 — 2 


and т= (n- D(N-1-2) 


If N and m are small, the correction for continuity should be made 
by subtracting 1 from S and increasing the divisor of W by 2, so 
that 
S-1 un 12(S - 1) 
(118) — W'-:3N(N-1),.5  mN(N?— 1) +24 
PAIS DESEE +2 


For the data of Table 11.5, the test of significance would be 


— 12640-1) | 
= 76(8)(63) 4 24 7 7799" 


= 11.97 has т = 64, m = 19.5 


W' 


3w” 
1-W’ 
Reference to the F table for n; = 6, огт = 7 and mz = 19 or m = 20 
shows that the largest of the values in the four cells indicated is 
Е.м = 3.94. The observed F is even larger than that value so that 
interpolation is, needless. There can be no hesitancy in stating 


and E 
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that the four sets of ranks show agreement far beyond what 
might be produced by sampling variance. 

For such data what is the “best estimate” of the “true rank- 
ing”? These data have shown a significant concordance. Let it 
be assumed that the relations among the rankings reflect the true 
ranking, which of course may not be the case. Then the ranking 
of the sum of the ranks is the “best” ranking in the least square 
sense (see page 421 of reference 10). For the data considered, this 
would make the “best ranking" B, D, A, C, F, E, G, Н. If the 
data had not shown significant concordance among the various 
rankings, then no composite ranking could be considered 
meaningful. 

The Tau Coefficient. Kendall (see references 10 and 11) has 
proposed a coefficient called 7 (tau) which provides an alternative 
method of measuring relationship between ranks. While not quite 
so simple to compute as Spearman's coefficient of Formula (11.42), 
т has the great advantage that its sampling distribution is known, 
especially in that area for moderate sized values of N larger than 
10 and too small for use of the normal approximation, where there 
is no very good method of testing the rank order coefficient. This 
measure will not be described here because to test its significance 
the tables Kendall has computed are needed, when N is small. 

Relation Between Two Traits Expressed in Qualitative Cate- 
gories. The existence of relation can be tested by the chi-square 
test, but its measurement is not always possible without additional 
assumptions. The case in which there are just two categories for 
each trait has been discussed in an earlier section of this chapter. 
If for each trait the qualitative categories can be placed in a 
meaningful order, it may be possible to assign numerical scores to 
the categories and compute a product moment correlation with 
these scores. One assumption which might be used to obtain 
such scores if it appears reasonable is that the various categories 
are spaced at equal intervals along a scale, so that the numbers 
0, 1, 2, 3, 4, . . . may be assigned to the categories. Another is 
that each trait has a normal distribution. Ranks may then be 
normalized as described in Chapter 17. 

If the categories for either variable do not have a meaningful 
order there is no satisfactory way of measuring relationship, 
although x? can be computed to test the hypothesis that no re- 
lationship exists. If the categories for both variables can be ar- 
ranged in meaningful order and if x? is large, so that the hypoth- 


References - 287 


esis of independence of the two variables is rejected, the research 
worker is likely to feel an intense need of some measure of relation- 
ship. For this situation Karl Pearson “ proposed the coefficient of 


contingency 
уо 
ч EN 


This is not a very satisfactory measure of relationship but under 
the circumstances no better measure is available. If x? = 0, then 
C = 0, but that situation very seldom occurs. C cannot be nega- 
tive. Its maximum value for a table of k rows and k columns is 


c—-1 š а. а : 
V Ё rae Imagine 200 cases distributed in 4 rows and 4 columns 


with 50 cases in each diagonal cell and all other cells empty. The 
200 scores could not be distributed in a way to indicate higher 
relationship. For this table 


g 600 з 
x? = 600 and C= 600 + 200 ^ 7 7 87 not 1.00 
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D The Statistics of Measurement 


All measurement is infested with error. The physicist, 
the engineer, the astronomer, the surveyor, as well as the edu- 
cator and the psychologist are vividly aware of this liability in 
their data and aware of the importance of holding measurement 
error to a minimum and of estimating the effects of such error 
as cannot be controlled. In the preceding chapters of this book we 
have apparently assumed that observations could be taken at their 
face value and could be treated us perfectly accurate measures of 
individuals on whatever traits they purport to measure. The 
discussion in earlier chapters of this text has proceeded almost as 
though the only important uncertainties in statistical studies were 
uncertainties about population parameters when these are es- 
timated from sample statistics. However, in the social sciences 
the variance due to errors in measuring the individuals selected 
is often larger than the variance due to sampling errors in selecting 
individuals. Furthermore while sampling errors have a random 
effect upon the statistics computed, measurement errors affect 
certain statistics in a systematic not a random fashion. and so 
cannot be safely disregarded. If measurement error has remained 
unnoticed in the preceding chapters that has been done for the 
sake of the reader, to allow him to grasp one thing at a time. 

The discussion in the present chapter will be directed toward 
answering such questions as the following. If the same individuals 
take two forms of a mental test and the correlation between the 
two sets of scores is found to be only, say, .60, will the test be 
useful in a situation where predictions are to be made concerning 
the performance of individuals? Will it be useful in a study con- 
cerning group performance? How does the variability among 
scores on one test form condition the answers to these questions? 
If the test can be lengthened, how will that affect the correlation 
between forms? Is it feasible to make the test long enough to 
yield a correlation of .90 between two forms? What is the differ- 
ence in the information provided by the correlation between the 
results of giving the same test twice, of giving two test forms on 
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two different days, of giving two test forms on the same day, of 
giving one test form and correlating scores on chance halves? If 
the correlation between two test forms is low, how will that fact 
affect a test of significance employing that test as a measure of 
the trait concerned? 

Symbols. In this chapter we shall be concerned with multiple 
measures made on more than one trait for many individuals. It 
will be necessary to distinguish the trait, to distinguish the in- 
dividual, to distinguish the measure. In order to be unambiguous 
the symbolism must be explicit, and consequently somewhat 
cumbersome. 

The letters, X, Y, 7, W will be used to name the traits (the 
variates). Thus tests on American history might be denoted X 
and vocabulary tests denoted Y. 

Хм will indicate the score on trial h for individual о. The 
first subscript will represent the test form used or the repetition 
made. Thus Хд might mean the score made by individual 7 on 
the third form of test X, or it might mean the score made by 
individual 7 on the third trial of task X. 

In Table 12.1 are displayed the symbols for mN scores of N 
individuals on m forms (or m trials) of test X, together with the 
various means and, variances. For the sample of N individuals 
the means and variances of the different test forms present no new 
concept. The set of m trials furnishes a sample of m observations 
of the performance of each individual with mean and variance 
as indicated. 

The student should note that the word “test” as used in the 
present diseussions connotes some measure of an individual and 
is not related to “test of significance" as used elsewhere in the 
text. 

For the universe of individuals the mean for trial h is the ex- 
pected value of X,, summed over individuals. This concept is 
thoroughly familiar but the subscript о will now be attached to 
the symbol E to indicate that summation is over individuals not 
trials, thus 


(12.1) Bn = EX.) 


For the universe of trials for individual æ the mean is the ex- 
pected value of X,, summed over the trials. This value will be 
denoted “tau sub alpha." 


(12.2) Ta = Ej(X44) 
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the subscript À being used to indicate that summation is over the 
trials not the individuals. 

For the universe of individuals the variance on test form h, 
or trial h, will be denoted 


(12.3) т? = Е„(Хы — иһ)? 

while for the universe of trials for individual а the variance will 
be denoted 

(12.4) Ta? = Ex(Xra — Ta)? 

The score on test form h (or trial h) made by individual a may be 
described by the equation 


(12.5) i Xia = Ta + Cha 


Here era is an error made in measuring individual o on test form 
h and т. is the expectation of X,, for the individual, or the value 
around which the various measures of that individual fluctuate 
as they are affected by different kinds of measurement error. т, 
is not amenable to direct observation but is surrounded by a 
kind of haze of measurement error in the same way that an in- 
accessible population parameter is shrouded in a haze of sampling 
error. 

The variance of X,, from individual to individual is denoted 
o*,, for the population, and s*,. for the sample, and is called an 
observed variance, being the variance of observed scores. If the 
different test forms are truly comparable measures of the same 
trait it may be assumed that they all have equal variance, so that 
(12.6) 0, = 0%, = +++ = 0%, = д? 
where c* is the expectation of the variance of all possible scores of 
all individuals around the population mean д. 

Even when Equation (12.6) holds, the variance of the different 
test forms need not be equal for a particular sample, 

85. ك‎ 85, p ج وى‎ ٠٠٠ RE 8%, 

The variance of т from individual to individual, called с, 
for the population and s; for the sample, is the true variance 
among individuals. It is that component of the observed variance 
which is due to genuine differences among individuals 


(12.7) к с? = E.(r. — и)? 
There is по way to compute c,* or s? directly, as т cannot be 


measured directly. Methods of inferring c;? or s,? indirectly will 
be discussed later in the chapter. 
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Components of an Observed Score. The error component 
indicated as e. in Equation (12.5) can be considered as arising 
from several different sources. The performance of any individual 
fluctuates from time to time partly because of changes in the 
individual himself, and partly because of changes in the environ- 
ment. There are errors in the observer so that the same observer 
varies in his judgment from time to time and the judgments of 
different observers on the same performance are not uniform. 
The measuring instrument itself may be inconsistent. 

In some situations it is possible to make separate estimates of 
the variance due to each of the three kinds of error. However, 
in most situations they are hopelessly confounded and their effects 
cannot be disentangled. Each of them may involve either ran- 
dom or systematic error, or both. In practical problems it is 
essential that the investigator try in advance of gathering his 
data to foresee what will be the chief sources of error, not only 
in order that he may control them as much as possible, but also 
in order that he may use the appropriate method of measuring the 
effect of such error on the reliability of his observations. There- 
fore, before going further, it will be well to consider some of the 
most common causes which play upon the “real” component т 
and the error component е of an observed score. 

If the individual is a person taking a psychological test, 7 is 
affected by the level of his general ability, by his familiarity with 
such tests and his general ability to understand instructions, by 
his knowledge of the pertinent subject matter or his skill in the 
function tested, and the like. If the individual is a rat running 
a maze, т is affected by his learning ability, his maturity, his 
physical vigor, the keenness of his senses, his testwiseness, and 
the like. If the individual is a pipe, the internal diameter of which, 
is to be found, or a cement mixture tested for the pressure it 
will bear, 7 represents the average of all the large number of 
measurements which could be made, measurements which will 
differ one from the other because the object measured is not com- 
pletely uniform, the measuring instrument is not completely 
reliable, and the person making the measurement has certain 
human frailties of touch, vision, etc.,-which make his measure- 
ments not absolutely dependable. 

One component of error is due to the observer. If a test per- 
formance is to be timed, no human observer can be completely 
consistent in his timing even with the most accurate of stop- 
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watches. Moreover, some observers will show a systematic leniency 
in timing, others a systematic tendency to call time a split-second 
too soon. When judgment of quality enters into the process of 
scoring, this component of error is never negligible and is seldom 
entirely random. Observer error is never wholly absent from 
reading a gauge such as a thermometer or pressure gauge, and it 
varies with the type of instrument. Observer error is at a mini- 
mum and perhaps wholly eliminated from scores on an objective 
test of the pencil and paper type. 

Another component of error is due to fluctuation in the in- 
dividual measured or in the environment. Since a person is never 
precisely the same at two different moments in time, temporary 
states of physical well being, emotional tension, preoccupation 
with other matters may affect his performance favorably or un- 
favorably as compared with his typical performance. Attention 
fluctuates continually and a momentary lapse of attention or 
memory may affect his performance. Distractions occur during 
the testing period, a pencil breaks, a fire siren sounds nearby, a 
neighbor sneezes disconcertingly. Conditions preceding the test- 
ing period are never uniform for all participants, and are usually 
unknown to the examiner. Individuals do not completely shed 
their worries or their excitements when they begin a test. In cer- 
tain kinds of tests the amount of recent practice on specific skills 
may affect the score. If the individual is a person, this component 
of error is always large. If it is an inanimate object this component 
is likely to be relatively smaller but not necessarily absent. Ma- 

` terials expand and contract and wear out. Materials are not 
completely uniform and samples cut from different parts of a 
piece of cloth or different points on a metal rod do not test alike. 
Machines do not perform with complete consistency at all times. 

Another sort of error comes from inconsistencies in the measur- 
ing instrument. This component is of paramount importance in 
pencil and paper tests for which a limited number of items has 
been selected out of a large pool of items. If different sets are 
randomly selected, they will not be equally difficult for all indi- 
viduals measured, but one set will favor some individuals and 
penalize others. This error is really a chance error produced by 
sampling of items. It is the only error inherent in the test itself 
and the only one affecting that which can properly be called the 
reliability of the test while the reliability of the observations is affected 
by other random errors as well. 
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If test items are not selected at random (and they usually are 
not) there may be a consistent or systematic error as well as a 
random error in any particular set. Such consistent errors affect 
the validity of the test. This validity is affected not only by con- 
sistent errors in the test items but by consistent errors of the 
observer and of the individual measured. 

Over and above all the preceding, there are certain chance 
errors such as those related to luck in guessing the right answer 
on partial information, or no information. These may well be 
absorbed into one or the other of the preceding components of error. 

Effect of Measurement Error on the Mean. Random errors 
have a negligible effect upon the mean of a sample, because posi- 
tive and negative errors tend to cancel each other. However, 
because they increase the variability of the sample, as described 
in the next section, they increase the standard error of the mean 
and consequently decrease the reliability of the mean. Random 
errors may even completely obscure a genuine difference between 
two or more means. Therefore, when a sample has shown a non- 
significant difference between means the research worker is always 
obliged to consider the possibility that a real difference exists but 
that his observations are too unreliable to reveal it. For this 
purpose (as well as for other purposes) he needs to study the 
reliability of his observations. 

If all possible measures of each individual were taken, then the 
mean of Xj would be identical with the mean of Ta for the sample 
as well as for the population. Even so the variance of X, would 
be larger than the variance of Ta. However one never has all 
possible measures of individuals, but usually has only one or two 
measures for each. Then the sample mean of X, is not necessarily 
identical with the sample mean of Ta, though the discrepancy is 
negligible if errors are random. 

If only one or two measures are available for each individual 
and if in these there is a systematic error, one has no assurance 
that the effect of errors on the mean is small and no assurance that 
errors increase the variance. Bias in the observer, an abnormally 
easy or abnormally difficult test form, some aspect of the testing 
situation which served to distract the attention of all individuals 
tested, or the like, might depress or raise all scores on a particular 
trial or might depress or raise the scores of some individuals and 
not-of all. Under such circumstances errors are not random and 
their effect may be almost anything. It is not usually possible 
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to make a statistical test to discover whether errors are random 
or not, and therefore a careful logical scrutiny of the experimental 
situation is particularly important. 

Let ej, and ej, represent the errors in the score of individual 
а on test forms л and j. Since only two forms are postulated, we 
сап say nothing about the expected values obtained from all pos- 
sible test forms as indicated at the bottom of Table 12.1. The 
expected values obtained by averaging X; over all individuals in 
the population may be considered. If X;,isà measure which differs 
from 7. only because of random error, then the expectation of 
such error is zero and so 

Е„(ем) =0 

(12.8) and  BalXra) = Ea(ta + e) = Е„(т„) + Ea(€na) = Ea(ta) 


By the same argument if errors are random because items have 
been assigned to test forms at random the expectation of scores on 
any other test form is also equal to the expectation of Ta. Con- 
sequently if errors are random, the expectation of scores on any 
one form is equal to the expectation of scores on any other, and 
each is equal to the population mean. 


s EGO) E(Xj,) ++ me E.(X ma) m E.(ta) = p 
ап 

(12.10) Ea(Xra – u) = E(X4) -u- р-и= 0 

апа 

(12.11) Hate — u) = Ente) - u = 0 


For any particular sample, random errors may produce slight dis- 
crepancies among the means of different forms. If errors are not 
random, the test forms may be of unequal difficulty so that the 
expectation of errors is not zero over the population and therefore 
two test forms do not have the same population mean. 

Effect of Measurement Error on the Variance. Random 
errors of measurement consistently inflate the variance so that 
the variance of observed scores is larger than the variance of 
“true” scores. Because people who think intuitively rather than 
algebraically often assume that random errors might sometimes 
increase and sometimes decrease the variance, recourse must be 
made to the formula. By squaring X4, — u = (Ta — ш) + еы and 
taking the expectation of each term for the population of indi- 
viduals, the following formula is obtained: 


(12.12) Oe = oj +62 + 2р„о,о, 
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If errors of measurement are uncorrelated with “true” scores, the 
last term disappears leaving 

(12.13) а? = 72 +02? 

The assumption that р„ = 0 appears to be reasonable in very 
many research situations. For the sample, r, will usually not be 
precisely zero. 

As will be seen later, this inflation of the variance by random 
errors of measurement presents a serious hazard in tests of signif- 
icance, for unless it is possible to obtain fairly reliable observa- 
tions the hypothesis that two or more means are equal can be 
refuted only when differences among the means are very great. 

Indirect Estimate of the “True” Variance о,2. Suppose two 
forms of a measure X are available, which may be called X; and 
X5, or X, and X; if greater generality is desired. 

Xia = Tatla and Xs = т. + Cra 
The correlation between X; and X; is usually called the reliability 
coefficient of the measure. It may be designated pzz for the popula- 
tion and т, for a sample. 

Since X, and X; are comparable measures, it may be assumed 
that errors are equally variable 
(12.14) а? = а, = oe 
and errors are equally correlated with 7 
(12.15) Par = Par = Per 
If errors are random, p,, may be assumed to be zero. This is 
usually the case. However, if a test might conceivably be so 
designed or so administered that persons with low standing tended 
to incur consistently greater or consistently smaller measurement 
errors than persons with high standing it would not be zero. If 
(12.14) and (12.15) hold, then 

Ta = а, = сг. 

The correlation pzz may be found by substituting 

Xia — и = (Ta — и) + бы for and Xs — p = Ta и + Cra for y 


in the formula pzy = ЖООДУР taking the expectation, апа ге- 


ducing the result by the application of Formulas (12.14) and 
(12.15). By this procedure it may be found that 

с? + 20,9, e+ Das 
© tay aT Uae T. 


(12.16) pss 
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Now, if errors are random р„ = 0 and pae, = 0. In that case 
2 


(12.17) Ріг = —5 


с? 
and for the sample, approximately, 
2 
(12.18) ко. 
8: 
Formulas (12.17) and (12.18) constitute an important interpreta- 
tion of the reliability coefficient. If two measures differ from each 
other only. because of random errors of measurement, the correlation 
between the two is the ratio of the variance of true scores to the variance 
of observed scores. 
Solving (12.17) for о, and (12.18) for s? gives the formulas 


(12.19) of = O: Pss OF 0, = ON р 
‚ (12.20) 828,772: or 8,55 85V Ts 


which provide a method of indirectly estimating the variability 
of the “true scores.” 

Indirect Estimate of the Error Variance. By Formula (12.13) 
the total observed variance was partitioned into two portions, 
the variance of true scores plus the variance of errors of measure- 
ment, 0; = 04 c, 
As an estimate for *,ڍ‎ has now been found to be 0;p.., an estimate 
for ø can be found by substitution. Then 


(12.21) OÈ = 0; — ср. = o; (1— p.) 
(12:22) ... 7, = сМ — р. 


At this point the student may profitably look again at Formula 
(10.20) and compare it with (12.22) noting that the former involves 
p' and the latter p. The error e may be thought of as a residual’ 
error incurred in predicting a true score т from an observed score 
X by the regression equation 


Ta = Хы With Ta — Fa = Che 
Then ø. is the variance of residual errors corresponding to с? 
of Chapter 10. 

The corresponding formulas for the sample are reasonable 
approximations if N is not too small. If N is small, т, may differ 
considerably from zero and N —2, which is the appropriate 
denominator for s may be rather different from N — 1, which 
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is the denominator for s. With these reservations, however, we 
may state the sample formulas as 


(12.23) 8? & sz (T — т) 
and 
(12.24) 8,098, A 


Effect of Measurement Error on a Coefficient of Correlation. 
Let us suppose that X; and X» are two measures of trait X and 
that Y, and Y; are two measures of trait Y, and that 


Xia = Ta + ёа Yu = va + dia 

(12.25) Хз = Ta + Cra You = Va + Фа 
where 7, and va (upsilon) are the measures of those traits for in- 
dividual æ without measurement errors and e and d are the respec- 
tive errors. Then p, is the “true” correlation between X and Y, 
that is the correlation which would exist if there were no errors 
of measurement, while pzy is the correlation: between observed, 
fallible measures. 

We have seen that 

Mz = и, and X approximates т 

с. > с, and s, tends to be larger than s, 
What is the relation of pzy to рл, and of rz, to Ty? 

To answer this question, we shall assume that 


сы = с са = Ga, Ele) = Ele) = E(d) = E(d») = 0 
Pre, = Pres Dod, = Pod, 
The student should paraphrase each of these statements in words, 
We have already seen in Formula (12.19) that о, = 0: Prz 
Similarly of = oy py, 
and E(X — uj)(Y = ш) = Pn G20 + Pawe, + PirTa0r + paga 
But 


٠ E م‎ = wy) 
(12.26) Pry = mr с 
_ PF Fu + Peo FoF + purTaT, + Dea Ta 


0,0, 


(12.27) Pry 


Tf errors are random they are uncorrelated with true scores so 
pa = 0 and pu, = 0, and they are uncorrelated with each other so 


рга = 0. In that case 
(12.28) Pry = Parras e Pr V Prz V Pw 
00, 
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and therefore 
Pry 
12.29 рь = 
( ) тд 
The sample formula corresponding to (12.27) is obtained by sum- 
ming over № cases instead of taking the expectation. It holds 
precisely: 


78,8, F 78,8 + T,48,84 + 748,88 
= aes tease ar еч 
Even if errors are random, the correlations ra, ты and r.4 would 
usually be only approximately and not precisely zero for a sample. 
Therefore, the sample formulas analogous to (12.28) and (12.29) 
must be considered as approximations only: 


(12.30) Tay 


(12.31) Tay S T V Tos Voy 
pie с E 
(12.32) to Ot cem 


The preceding formulas indicate that when measurement error 
is random, the correlation of observed scores X and Y is numeri- 
cally smaller than the correlation of “true” scores т and v. This 
lowering or attenuation of the correlation coefficient is inescapable 
since the correlation coefficients r,+ and ry, for the sample as well 
as pzz and py for the population are less than 1. ' Formulas (12.29) 
and (12.32), which give an estimate of the correlation between 
Scores freed from measurement error, are called the correction for 
attenuation. 

It must be noted that the correction for attenuation was ob- 
tained on the assumption that three of the four terms in the 
numerator of (12.27), or of (12.30), were zero. This assumption 
is almost impossible to verify empirically, but the research worker 
will find it instructive to think about situations in which it is 
unreasonable on a priori grounds. Two illustrations will be given. 
The possible variety of such is almost inexhaustible. 

Suppose that one form of test X and one form of test Y are 
given on the same day, and that a few individuals are suffering 
from some unusual strain which adversely affects their perform- 
ance on both. Then r.a is likely to be not zero, but positive. Con- 
sequently, the right-hand member of (12.30) is likely to be larger 
than the right-hand member of (12.31) and so Formula (12.32) 
will overestimate r,,. 
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Suppose only one form of test X is available and т„„ is the 
correlation between two applications of the same test form. Then 
it cannot be assumed that fee = 0 since the errors an individual 
makes on one application of the test will not be unrelated to 
those made on thé other. If re, is positive, rz: is larger than - 
s2/s.? and т„ is overestimated by Formula (12.32). 

The correction for attenuation, if applicable under the assump- 
tions, becomes a sort of ceiling indicating the highest value to 
which the correlation between two measures could be pushed by 
improving the reliability of measurement. Suppose two forms of 
an arithmetic reasoning test have shown an intercorrelation of 
г = .66, and two forms of a verbal reasoning test an intercorrela- 
tion of ^yy = .71. The correlations between a single form of the 
arithmetic and a single form of the verbal test are variously 
т, = .55, .49, .54, and .51. Is it conceivable that the two tests 
are actually measuring the same function and correlations between 
them are less than unity only because we have imperfect measures 
of each? In order to apply Formula (12.32) some average of the 
four values of rzy is needed. Strictly speaking we should average 
the four covariances but no data are given concerning the covari- 
ances or the standard deviations. (The covariance of z and y is 
defined as E(X = uj)(Y — uj) = рас:су) The arithmetic mean 
of the four correlations is 2(.55 + .49 + .54 + .51) = .5225. The 
geometric mean is [(.55)(.49)(.54)(.51)F = .522. If each of the 
4 values of r is converted to z, the z’s averaged and the average of 
the z's is converted back to r, the result is .52. Obviously the 
method of averaging will have little effect on the outcome. 

.52 
Т» 7 C66) C71) 16 

In interpretation it may be said in passing that reliability 
coefficients of .66 and .71 make the tests of little value as instru- 
ments for measuring individuals. ' 

The correlation between the two traits could not be expected 
to rise higher than about .76 even.if all errors of measurement could 
be eliminated. The functions measured by the two tests cannot, 
be held to be identical. 

The Coefficient of Reliability. If two measures of trait X differ 
only by random errors of measurement, 


Then 


2 
(12.33) ра 25 = 1-5 
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With the qualifications stated on page 300 the analogous formula 
for the sample holds approximately, 


a 
(12.34) кыры i= "E 


"These formulas will be considered as a definition of the coefficient of 
reliability, or the reliability coefficient of measure X. 

The question of how to obtain r from data will be discussed 
in a later section. It presents certain difficulties about which a 
great deal has been written. Within the scope of the present book 
only the major problems can be considered. 

Effect on the Coefficient of Reliability of Changing the Length 
of the Test. Sometimes when the coefficient of reliabilit y is 
unsatisfactorily low, the test maker wishes to know how much it 
might be increased if the test forms were lengthened by the 
addition of a specified amount of similar material, or how much 
material would need to be added to the tests in order to produce a 
reliability coefficient of a given size. Conversely, he may feel 
that the test requires more testing time than can be justified and 
he may wish to estimate how much the reliability coefficient would 
be decreased if the test were shortened. 

To answer any of these questions one must assume a homo- 
geneous body of material from which test items are chosen at 
random so that errors due to selection of test material are random 
errors, One must also assume that errors due to variation within 
the individual tested, or to variation in the administration of the 
test, are random errors. Then it can be assumed that the covari- 
ance for any two forms is the same as for any other two forms, or 


Е(Х, — u)(X; — и) = peo? 


forany kandj. Assumptions which are intuitively clearer though 
somewhat more restrictive are 

(1) all forms are equally variable, so 0%, = 02, =... = 0,2 and 
and (2) all correlations between forms are the same, so 


Ёла =... = Paz, = Prz 


Now let зр, be the correlation between (X; + X3) and (X; + X.) 
and 

20: be the correlation between (X, + X; + Хз) and (X +X; + XQ 
and pPzz be the correlation between (Xi+---+X,) 


and (Xo +++ + Xa) 
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2 
12.35) Then ae е 
( ) 2p. IET 


2r; 


LET 


(12.36) and approximately: „= 


et Dpzz 

(12.37) also р BENE 

Przz 
1+ (p—1)rez 
Formula (12.38) is commonly called the Spearman-Brown Prophecy 
Formula. The names of Charles Spearman and William Brown 
are both attached to it because papers by these men appeared 
simultaneously in the same issue of the same journal * °, 


(12.38) and approximately „== 


Illustrations. 
A. A test with reliability coefficient .52 requires 15 minutes to ad- 
minister. If it could be increased in length to require 60 minutes, how 
much increase might be hoped for in the reliability coefficient? Here 
p = # = 4. Then 
NE CON 
“== T+ 3(.52) 2.56 


Actual experience in lengthening the test and giving it to a new group is 
of course necessary, because the added material may not be entirely 
comparable to the original items, and individuals may work either more 
or less effectively in the longer period. 

B. If a reliability coefficient of .90 were required for the purpose in 
hand, how long would the test need to be? Now p is the unknown to be 
found from the equation 


48 


X p(.52) 
307 Гр - 1)(52) 


Solving this equation gives p = 8.3 so the time presumably required would 
be (8.3)(15) minutes or 125 minutes. If that appears to be more testing 
time than can be suitably allowed, the test maker may consider whether 
he can tolerate a lower reliability, or whether by changing the type of item 
he can secure greater reliability in less testing time. He will have dis- 
covered without laborious writing of new items that sheer lengthening of 
the test by addition of similar material is not likely to achieve the desired 
result. 

Effect on the Correlation between Two Measures of Changing 
their Reliabilities. A. When the data are given in terms of reliability 
coefficients. (1) Suppose 7z, = „55 when Tz = .65 and Ty = 297 
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If each reliability coefficient could be made .85, what value of rz, 

could be anticipated? Let r’z, designate the value of rz, to be 

anticipated when the two reliability coefficients are r’,, = .85, and 
=:85. 


By Formula (12.32) fie VOSCS 
fy 
and also TS S = ICSB). 85) C85) 
ү, .55 
T Í — SY — 
heroe — S89) VO) 
and so joi FEED) r 
= N/(.65)(.75) 
Generalizing this procedure, we may write 
УЙ Ыы» 
(12.89) T oe NU 


(2) Suppose r,, = .55 when rz, = .65 and ry = .75. If measures 
of X could be made completely reliable without any change in 
the reliability of Y, what value of 7',, might be anticipated? The 
question implies that 7’zz = 1.00 and r’,, = ry, and that r^,, is really 
to be r,, because measures of X are to be free of error. When these 
values are substituted, Formula (12.39) becomes 


(12.40) т ж ست‎ 


and similarly ta = 


Then Ty S = = 68 
65 


B. When the data are given in terms of the proportionate change in 
length ab tests. Suppose it is dese to find the correlation between 
(Xi +۰.۰ Xp) and (Yi +--+ Y,) where it is assumed that Xi, 
Hon AS afer p comparables measures of X and Y1... Y, are q 
comparable measures of Y. Let this correlation be designated 
аргу. We speak here as though p and д were integers but the 
resultant formulas apply even when р and q are fractions. Let 
either one of the following sets of assumptions be made: 
(1) The covariance of any two measures of X is рт? 
_The covariance of any two measures of Y is 9,,0,? 
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The covariance of any measure of X and any measure of Y 
iS PryF20y 

All measures of X have the same variance pos 

All measures of Y have the same variance o,? 

The correlation between any two measures of X is pz 
The correlation between any two measures of Y is py, 
The correlation between any measure of X and any 
measure of Y is pzy. 

On the basis of these assumptions 


(2 


— 


(12.41) Poy = Popes 
UT Мр + рр = Deos Уа +40 — Dp 


and approximately 


POT zy 
0 Ур НЕ р(р 5 Ir. мд а alq т Yy, 
Suppose rz, has been found to be .35 when rz, = .58 and ry, = .65. 
It is decided that the test for X can be made twice as long and 
that for Y three times as long as at present. What correlation 
might be anticipated between X and Y? Here p= 2 and q = 3. 
Then 


(12.42) palsy & 


MG 6(.35) са 2.10 _ 2-10 
met \/2 2058) V3 + 665) v(3.16)00.00) 4:67 
It is interesting to note that Formulas (12.32), (12.36), (12.38), 
and (12.40) are special cases of (12.42) and can be derived from 
it by appropriate substitutions for p and q. 

Effect of Measurement Error on a Regression Coefficient. 
Random errors of measurement have been seen to reduce the 
correlation coefficient and to increase the standard deviation. 
What effect do these errors have upon the regression coefficients 

` E(u — wing e) " =? 
В = Е = апа 4-7, 
How do these coefficients to d v from 7 compare with the 
corresponding coefficients to predict y from x? 
The question is easily answered by comparing the formulas 
E(u — һ)(т — А) EQ = (X - n) 
B. = E(r- iy and By E(X = uj) 

Errors in Y, which is the dependent or predicted variable, can 

obviously have no effect on the denominator. Random errors 


= 45 
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in X, which is the independent variable, have already been seen 
to make E(X ш„)°* = c; larger than E(r— u,)? = с. See 
Formula (12.13). 

Random errors in either X or Y have no effect upon the numer- 
ator. If errors are uncorrelated with each other and with true 
scores, then 
E(X — (У = m) = Е(т- m,  €)(v — m + d) 

= E(t — u)(v — ш) + E(v = и.) 
+ E(u — m)e + E(ed) 
= E(r ~ u)(v — m) +0+0 +0 
Similar statements hold approzimately for the sample coefficients 
b,, and бу. 

Therefore it may be stated that random errors in the predicted 
or dependent variable have no effect upon a regression coefficient 
and that random errors in the independent variable reduce the 
coefficient. — 

Effect of Measurement Error on Tests of Significance. In 
general, random errors of measurement make the null hypothesis 
appear more tenable than it would be if true measures could have 
been used. 

In an earlier section of this chapter it was shown that random 
errors of measurement have no effect on the mean of a population, 
so that и. = u,. Their chief effect on the mean of a sample is 
to increase its sampling variability. As c,*is increased by meas- 
urement error to сг = 0, + ce, than cz! is increased from 
c. д 0, do 
NA UD Жүздү 

When several means are-to be compared by use of an F ratio, 
random errors of measurement may raise or lower any of the means 
slightly, but the effect on the numerator of the F ratio is small 
and may be such as to make the means either more alike or less 
alike. However, the denominator sum of squares is consistently 
increased by random errors and so the F ratio is reduced by them.: 
The ¢ ratio of the difference between two means to its standard 
error is similarly reduced by random errors. When an investigator 
has accepted the hypothesis that two or more means are equal in 
the population he must always keep in mind the possibility that 
a very real difference has been obscured by the measurement error 
present in his data. This is one of the reasons why some evidence 
concerning the reliability of the data should be sought. 
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"The F test of significance for r is affected in the same direction 
by unreliability of the scores. As т is attenuated by error, 


is also reduced in size. In general, the greater the measurement 
error, the less likely is it that any statistical test will prove sig- 
nificant. 

Method of Estimating a Reliability Coefficient from Data. 
Although the reliability coefficient has been defined by the for- 
mulas 


2 2 2 
Or g, 8, 
pa-—-1—-* and r,=45=1-—, 
с: 2 8; 


and some aspects of its meaning have been discussed, there has as 
yet been no discussion of how to obtain it from data. The reader 
has probably noticed this omission and attributed it to oversight. 
The omission has been deliberate. Because every possible method 
of estimating a reliability coefficient from data has certain draw- 
backs, it has seemed better not to identify the concept with a 
particular experimental procedure at too early a stage in the 
reader's thinking. 

There is no direct way to measure either 7 or e but only 
X = т+е. Consequently there is no direct way to compute 
5, ог 8,2, If two measures of т can be obtained such that they 
differ only by random errors of measurement, X; = т + еі and 
Х, = т + ex then the correlation and variance of observed measures 
can be used to estimate s,* as 7.„.,8:. The difficulty is to set up an 
experimental situation in which X; and X; actually conform to 
this assumption. There are four principal experimental procedures 
commonly used, each of which has some advantáges and involves 
some serious drawbacks. These are: (A) correlating scores from 
two comparable but different tests given on two different occa- 
sions; (B) correlating scores on a single test given twice with a 
time interval between the repetitions; (C) correlating scores on 
two tests given on the same occasion, or scores on two halves of 
a single application of a single test; and (D) analyzing the variance 
among items on a single application of a single test. In con- 
sidering the appropriateness of each of.these four procedures, 
the essential requirements are (1) that 7 shall be a measure of the 
trait it is desired to measure and shall not change from one form 
to another, or from one occasion to another, and (2) that e; and ez 
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shall be random errors due to variation of the kind it is desired 
to appraise. It is therefore especially important to make an 
a priori analysis of the four procedures. We shall now discuss 
the statistical aspects of these four procedures. For practical 
advice as to how to construct tests which are likely to conform to 
the assumptions made, the reader should consult a treatise on 
test construction ? * 1, 

A. Correlating scores on two different test forms given on two 
different occasions. Suppose a large pool of test items is available, 
each of which has been constructed to conform to specifications 
drawn up for the test. Then let items be drawn at random from 
this pool and assigned at random to the two test forms. Under 
such a plan, the two forms should measure the same т and differ- 
ences between them should be due to random errors in the sampling 
of items. If items are not assigned at random to the two test 
forms, these may be either more alike or less alike than under 
random assignment of items. If they are more alike, з,? is under- 
estimated and т, is overestimated. If they are less alike, s,’ is 
overestimated and r,, is underestimated. In many situations, 
some of which are discussed by Thorndike and by Cronbach in 
the references cited, it is impossible to secure a large pool of items 
or to construct comparable test forms. If the two test forms are 
not of equal difficulty, or if-the items are of two different types 
(say one test is a completion test and one a multiple-choice test 
on the same subject matter), or if the subject matter differs from 
test to test, then the two forms are not measuring the same т. 

The purpose of allowing a time interval to elapse between the 
administration of the two tests is to permit temporary states of 
attention, fatigue, well-being and the like in the subjects, and 
accidental advantages or disadvantages in the environment to 
register differences in the scores, Therefore, the time interval 
should not be great enough for the real abilities of the subjects to 
undergo a change, for in that case the two scores would not be 
representing the same т. If the second test is given immediately 
after the first, temporary states of well-being of the subjects will 
have similar effect on both scores, s? will be underestimated and 
Tzs overestimated. 

If two comparable test forms are available and if it is admin- 
istratively feasible to allow time for testing on two different days, 
this procedure appears to give the most satisfactory estimatb-of 
Tz. The error variance then represents variation due both to 
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selection of test items and to fluctuation in performance of sub- 
jects. 

B. Correlating scores on repetitions of the same test form. In 
this procedure the task is precisely the same on the two administra- 
tions. Therefore, if the sampling of test items favors a particular 
subject unduly on one occasion they do so on both, and departures 
from his true score because of sampling of test items does not show 
up in the error variance but works to increase rz. unduly. In this 
procedure the error variance is due to differences in the per- 
formance of the subjects on two occasions, so т is measuring the 
stability of the subjects’ performance rather than the reliability of 
the test. If in the interval between repetitions, some of the sub- 
jects have had more practice than others in the function tested or 
have had some experience which changed them in respect to that 
function, the corresponding changes in score will appear as errors 
when actually they represent changes in т. Then sè will be 
overestimated and rz- underestimated. 

Sometimes the only error in which the investigator is interested 
is performance error. For example, consider the archery data 
presented in Chapters 9 and 14. The essential task is uniform, 
shooting at a uniform target under conditions as nearly standard 
as they can be made. The stability of performance of the subjects 
is properly the aspect of reliability to be studied. In measures of 
posture, of metabolism, of reaction time, or in general of physical 
states or simple skills, the essential kind of error is performance 
error in the subject and perhaps also in the person administering 
the test. No sampling of test items is called for. 

In measures of knowledge, achievement, temperament, atti- 
tude, opinion, and the like, repetition of the same test form usu- 
ally greatly underestimates s. and overestimates Tzs. Errors in 
measuring an individual because the test items are a sample out 
of all possible items should contribute to з,2 but cannot do so when 
the same test form is repeated. Furthermore, there is the risk that 
some memory of answers given on the first test may affect answers 
on the second, thus further reducing s. and fallaciously increasing 
Tas. 

C. Correlating scores on two test forms given on the same oc- 
casion. (a) When two distinct test forms are available. The first 
paragraph of the discussion under A applies here also. The error 
variance includes variance due to the sampling of test items. 
Variance due to instability of nerformance of individual subjects 


310 - The Statistics of Measurement 


does not appear as error but as part of the measure of 7 in both 
tests thus raising the estimate of т... (b) When only one test form 
is available and two scores are obtained from it by subdivision of the 
items. This procedure is very widely used. Perhaps there is no 
second form of the test. Perhaps time is not available for admin- 
istering two complete forms. Perhaps a test maker is developing 
an instrument and needs a preliminary estimate of its reliability 
before he carries out the work of standardizing two comparable 
forms. The single test form is administered, its items are in some 
manner divided so as to form two half-tests, and the scores on 
these half-tests are correlated. Then the Spearman-Brown 
Prophecy Formula, Formula (12.36) is applied, 


ose 


1+ Tz 


with the correlation between the half-tests used as r.. The 
resulting value of rz, is then assumed to be the correlation which. 
would have been obtained between scores on two full test forms 
had such been available. Certain comments on the method should 
be made. 

(1) The error variance takes account of the sampling of test 
items but not of the variation in subjects' performance from time 
to time. 

(2) Sometimes it is impossible to subdivide a test into com- 
parable halves. Such is the case when success on one item is 
conditioned by success on previous items, or when speed plays an 
important role. If a test is to be timed, the half-tests should be 
constructed in advance, and each, half-test given independently 
with its own time limits. Suppose a test consists of four blocks of 
questions. For example, suppose a reading test consists of 4 long 
passages with several questions on each. If half the passages and 
all the questions on each of them are placed in one form, the 
correlation between the two forms may be low. The application 
of the Prophecy Formula will in that case suggest the increase in 

reliability to be expected if the test is increased by the addition 
of new passages. If half the questions on every passage are placed 
in one form, the correlation may be quite high. The application 
of the Prophecy Formula can in this case be used only to estimate 
the correlation between two test forms obtained by the addition 
of new questions based on the same set of passages. Аз it is usually 
not possible to find a large number of new and equally good 
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questions, and as one is more likely to want to-generalize to a : 
universe of passages than to a universe of questions about a fixed 
set of passages, this second method of subdividing items has little 
to recommend it. 

(3) The number of ways in which the items of a test can be 
subdivided to produce two half-tests is very large, and each of them 
produces a different correlation between the test halves. If, for 
example, a test contains 20 items, the number of ways in which 
the items can be assigned to the two half-tests is 92378. 

(4) The Spearman-Brown Prophecy Formula is derived from 
rather restrictive assumptions (stated earlier in this chapter) to 
which the practical situation may not conform. 

D. Measuring the internal consistency of a test. The preceding 
section referred to the great variety of ways in which a test can be 
subdivided to form two half-tests and the consequent indeter- 
minacy of the estimate of the reliability coefficient obtained by 
correlating scores on the halves. This indeterminacy has caused 
research workers to look for a method which is independent of 
any particular split up of items. A formula which serves this pur- 
pose was developed by Kuder and Richardson.’ The symbols used 
here are slightly different from those in the original presentation. 

N = number of individuals taking the test. 

m = number of items in the test. 

р: = proportion of individuals answering the ith item cor- 
rectly. 

Np; = number of individuals answering the ith item correctly. 

q: = 1 — р; = proportion of individuals not answering the ith 
item correctly. 

Ма; = number of individuals not answering the ith item cor- 

rectly. 

s- = variance of the test scores of the N individuals if each 
score is the number of items answered correctly by the 
individual. 


н (н) 

(243) Then r4 = — 1M 7 I - туар 
m Np: = Ep?) 
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Formulas (12.43), (12.44), and (12.45) are equivalent. They 
provide a measure of the internal consistency of the items in the 
test. If data come to the computer in terms of the proportion of 
persons answering an item correctly, Formula (12.44) provides a 
convenient routine for machine computation since Zp and Хр? can 
be obtained simultaneously. If the number of persons answering 
each item correctly is available, Formula (12.45) should be used 
because it involves less computational labor and less error from 
rounding. 

These formulas provide a kind of average of the various reli- 
ability coefficients which could be obtained from all the possible 
ways of subdividing the m test items, but that fact is not imme- 
diately obvious and requires more algebra for its development 
than it seems appropriate to introduce here. 

The variance of scores, s;*, can be written as the sum of all item 
covariances and variances if each correct item is scored 1 and each 
incorrect item 0. In such scoring the variance of an item is 
Npq/(N —1). HenceZNp;/(N — 1) is the sum of item variances 
in the sample. In formula (12.43) the quantity NZpg;/(N — 1)8,1 
is the sum of item variances divided by the sum of item variances 
and covariances, each covariance being based on a pair of items. 

If the sum of all the covariances were zero, the numerator and 
denominator would be equal, the quantity in parentheses would 
be zero and r would be zero. However this situation is very 
unlikely for it is almost inconceivable that a test maker would 
put together a set of wholly unrelated items. It is still less likely 
that he would formulate a test in which many of the inter-item 
correlations would be negative. 

If the items are all measures of the same variable they should 
show positive correlations. If these inter-item correlations are 
‘high and positive, the inter-item covariances are large and positive, 
and the denominator s,* which is the sum of variances and covari- 
ances is much larger than the numerator N2=p.qi/(N — 1) which is 
the sum of variances only. Then the fraction RAE. is small 
(though it cannot be zero) and r. is near to 1. In this case the 
items are highly consistent with each other- 

Kuder and Richardson? have given several variations of For- 
mula (12.43) under different assumptions. Other proofs have been 
given by Jackson * and by Hoyt.‘ 

As an application, consider a test of 12 items given to 9 subjects, 
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each item being scored 1 if answered correctly and 0 if answered 
incorrectly as in Table 12.2. A genuine problem would be likely 
to involve a far greater number of subjects, but the pattern of 
procedure is the same, and the reader will learn as much from a 
small computation as from a long one. 


TABLE 12.2. Record of 9 Subjects on 12 Items 


Record on Item i Made by Subject a 
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Summing the columns of Table 12.2 produces the scores of the 
subjects, Xa. For these 9 scores, 


Z(X - X) = (№ - 1), = 60. 


Summing the rows produces the item scores Np; Subtracting 
each Np; from N = 9 produces Nq;. As a check we note that 
` 


12 9 
Ўр: = 72 = ЎХ, 
1 1 


that ZNq; = 36, 
and that УМр; + ZNq; = 108 = 9 x 12. 
If the computation is completed by Formula (12.45) we first 
find Z(NpJ(Nq) = 20+ 20+ 14 + 8 +۰.۰ +18 + 20 = 198 and 
12 198 
tw = Fi EE ae) = .69 


If the computation is completed by Formula (12.44), we obtain 
the column p; by dividing the Np; column by N = 9. It is neces- 
sary to carry these values out to several decimal places to reduce 
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the effect of rounding error. Then Zp; = 8.002 is checked against 
the expected value 4? = 8. Dp? = 5.559. 


12/, _ 9(8.002 — 5.559)\ _ 
r= (1 > oe) 69 


nf: 
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13 Multiple Regression and 
Correlation 


Methods of using scores on one variable to predict 
scores on another related variable were discussed in Chapter 10, 
and tests of significance for the correlation coefficient and the 
regression coefficient were given there. In this chapter scores 
on two or more variables will be combined to predict scores on 
another variable called the criterion and the following questions 
will be considered; What weight should be assigned to each of the 
predictor variables in order to obtain the best estimate of the 
criterion. variable? How good is that “best estimate"? What 
computational routines are economical and efficient? How can 
significance tests be made for the coefficients in the prediction 
equation and for the coefficient of multiple correlation? 
In this chapter it will be assumed that all correlation coeffi- 
cients are product moment coefficients as defined in Chapter 10. 
Prediction of Semester Grade in a First Course in Statistics. 
In order to estimate in advance the amount of difficulty students 
were likely to have in an introductory course in statistical methods, 
three prognostic tests were administered at the first session. One 
of these was a 45-minute test in reading difficult material. One 
was a specially constructed test of simple arithmetic and algebraic 
relationships of the sort most often encountered in statistical 
problems. One was an artificial language test involving unfamiliar 
symbols in logical systems of operation. Three class sections 
were available. Sections I and III met at the same evening period 
for a total of 3 hours a week. Students with good mathematical 
preparation and high scores on the prognostic tests were assigned 
to Section I while students who anticipated difficulty because of 
poor preparation and who had low scores on the tests were assigned 
to Section III, which was considerably smaller than Section I. 
Section II met in the morning for a total of 4 hours a week, and 
included students with widely varying background. The three 
sections were taught by three different teachers, When these 
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TABLE 13.1 Scores Made by Students in a First Course in Statistical Methods 
Subject Matter of the Course, and the Semester Grade (*indicates a women 


Prognostic Test Score Criterion Score 


Code .Artifiz — Arith- Mid- т. 
number Section Reading cial metic term a oa 

of student language test test gra 
1 III 33 51 33 58 65 62 
2 III 39 50 36 53 51 52 
3 I 38 38 39 52 53 53 
4 I 23 27 29 47 42 45 
*5 I 44 40 41 61 50 56 
*6 I 45 54 38 47 53 50 
7 I 48 57 42 64 64 64 
8 II 38 39 35 64 54 59 
II 32 46 23 58 50 54 
*10 II 28 45 37 44 45 45 
11 II 33 53 28 62 63 63 
12 п 40 43 25 52 50 51 
13 II 34 55 38 50 56 58 
14 II 34 57 36 71 68 70 
15 II 35 46 37 62 65 64 
16 I 32 43 39 52 59 56 
17 II 34 44 38 59 57 58 
18 II 42 51 35 53 60 57 
19 І 32 55 36 62 56 59 
*20 ш 24 34 17 37 43 40 
21 I 39 52 34 55 53 54 
22 1 34 25 29 30 42 36 
23 II 44 58 35 61 56 59 
24 III 27 29 26 49 40 45 
25 II 34 52 30 43 38 41 
26 II 40 53 29 61 54 58 
27 I 37 34 26 47 57 52 
28 In 22 40 33 47 50 49 
29 T 30 54 40 56 4T 52 
*30 I 36 58 25 52 39 45 
31 I 35 43 40 49 37 43 
32 I 39 58 32 49 37 43 
33 II 37 14 43 36 40 
34 II 38 51 26 40 42 4l 
35 II 36 55 31 59 52 56 
36 I 46 52 39 61 63 62 
37 II 44 57 32 46 53 50 
38 40 56 32 64 49 57 
39 ш 38 30 18 25 38 32 
40 50 47 32 58 52 55 
41 II 16 49 26 52 36 43 
42 I 46 60 43 61 57 59 
43 I 42 56 40 59 54 57 
*44 ш 27 53 26 62 48 55 
45 ш 33 55 25 55 57 56 
46 I 40 54 34 52 57 55 
Eu I 46 53 37 62 49 56 
48 I 46 58 39 58 56 57 
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on Three Prognostic Tests given at Beginning of Term, Two Examinations in 
student.) 


Prognostic Test Score Criterion Score 
Code Artifi: — Arith- Mid- ,, 
number Section Reading cial metic term хапа. Semester 
of student language test test Am grade 

*49 25 57 24 46 49 48 
*50 ш 26 45 18 47 25 36 
*51 I 42 46 31 58 56 57 
*52 I 34 60 33 62 64 63 
53 ш 28 37 8 47 46 47 
54 п 21 24 16 44 41 43 
55 I 44 43 35 58 55 57 
56 I 38 47 28 58 56 57 
57 ш 16 50 27 47 47 47 
58 ш 40 48 35 56 38 47 
59 ш 33 42 25 52 46 49 
*60 п 35 59 25 52 49 51 
61 IH 33 40 28 49 47 48 
62 III 30 41 30 56 59 58 
63 I 38 47 39 62 63 63 
*64 ш 36 43 26 61 50 56 
*65 I 45 57 35 53 46 50 
66 ш 32 32 21 50 41 46 
67 ш 36 53 27 49 43 
*68 ш 21 46 39 44 33 39 
69 п 30 54 34 47 46 47 
70 п 34 49 27 50 44 48 
71 п 41 58 39 53 57 55 
72 I 44 55 38 47 46 47 
"73 II 36 14 20 30 40 35 
74 I 58 50 38 68 55 62 
75 II 36 41 30 62 56 59 
76 III 81 14 18 27 30 29 
77 III 40 46 36 58 50 54 
78 1 44 60 43 71 45 58 
*79 ш 36 36 23 49 43 46 
80 І 44 51 36 53 47 50 
*81 І 34 46 37 43 49 46 
*82 І 38 56 36 64 61 63 
*83 I 38 52 33 50 53 
84 I 38 51 33 50 48 49 
85 I 46 51 40 62 54 58 
86 ш 30 42 22 37 33 34 
*87 ш 26 44 28 58 48 53 
*88 I 36 58 42 59 69 64 
89 I 36 43 42 46 44 45 
*90 ш 32 59 36 61 55 58 
*91 IH 30 23 28 38 38 38 
*92 I 31 58 31 44 51 48 
93 п 44 34 32 55 49 52 
*94 ш 14 16 18 40 33 37 
95 Шш 44 26 30 37 19 28 
*96 I 30 58 37 61 56 59 
97 I 39 34 33 41 46 44 
98 I 32 56 30 65 51 58 
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tests were first given, only more or less intuitive speculation 
could be used to indicate how scores might be interpreted to 
predict success in the course as no criterion was then available, 
At the end of the semester, scores on a midterm test and on a final 
test and the average of these two tests recorded as the semester 
mark provided criteria against which the effectiveness of the 
prognostic tests could be studied. A formula thus developed to 
estimate one or the other of these criteria from the three place- 
ment tests could then be used in giving advice to similar classes 
in subsequent terms. 

The data for all students who had scores on every test are 
shown in Table 13.1. An asterisk attached to a code number 
indicates a woman student. The figures in this table can be 
analysed in a variety of ways. In this chapter we shall consider 
problems related to the prediction of a criterion score from the 
three prognostic test scores. 

Simplified Problem with Two Predictors. In order to help 
the student gain clear concepts of the procedure and the meaning 
of estimating one variable from several others and the residual 
errors involved in that process, a very small problem will now be 
worked out with only two predictors and a small number of cases 
selected at random from the data of Table 13.1. For this problem 
we shall predict score on the midterm test (Y) from score on the 
reading test (X;) and score on the artificial language test (X»). 

Multiple Regression Equation with Two Predictors. To 
combine the two predicting variables (also called the independent 
variables) a multiple regression equation is used. For two pre- 
dietors, the general form of this equation is 


(13.1) Y.-A4 by 2X10 + bye Xr0 


The notation of this equation is to be interpreted as follows: 

X, and Хз, are observed scores for the ath individual. 

A is a number to be computed from the sample data. 

by з is a number to be computed from the sample data indicat- 
ing the weight given to X, in the regression equation. 
It is called a partial regression coefficient. 

ал із a number to be computed from the-sample data. It 
is the partial regression coefficient which expresses the 
weight given to X; in the regression equation. 

Ў, is the value of Y estimated from Хы and Xaa, by the regres- 
sion equation 
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The formulas for Б.з; bjs, and A will now be given without 
justification and the rationale will be developed in a later 
section. 

EE Ty = Tyla 8y Ta — Ту: 8, 
(13.2) bye er Rp and bya = eim ria 


(13.3) A = Y - by2ks — Б.Х 


In order to display vividly the relations involved, 10 cases were 
chosen at random from the 98 cases of Table 13.1 and computations 
carried through for these 10 cases. 

The mean and standard deviation of each trait are shown in 
the lower part of Table 13.2, also the three correlation coefficients. 
When these values are substituted in Formula (13.2) the values 
of the two regression coefficients are found to be 


bua = — .006 bya = 134 
Substitution of the means and by.: and by. in Formula (13.3) 
gives 
A = 51.2 + (.006) (34.1) — (.184) (49.2) = 44.9 


The regression equation is therefore Ў = 44.9 — .006X, + .134X; 
This equation should now be applied in turn to each of the ten 
individuals listed in Table 13.2, estimating his midterm test score 
Y on the basis of his two observed prognostic scores. These 
predictions are listed in the column headed Ӯ and the student 
should verify some of them. Errors of estimate are listed in the 
column Y — Ў and should be verified by the student. 

When the variables are expressed in standard score form, we 
shall attach a star to the symbol for the regression coefficient. 
In this case the constant A oe from the equation leaving it 

` Ta = Y X * Xa- Xe r X: 
(13.4) cua ic a dietus T 
Ty — Ту 
l-r*à 
* та а Ти 
b 2.1 = 1 Ет Ta 
For the data of Table 13.2 therefore b*, з = — .0084 and b*, = .165. 

Thus b*’s like the r's are pure numbers which do not involve 
the scale of measurement of the traits while the b's are affected 
by the scale of measurement. For all work in which a regression 


(13.5) where D*yı. = 
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equation is used to predict scores the b's must be used unless all 
variables are in standard score form. For almost all other prob- 
lems, including tests of significance, it is more convenient to use 
the b*’s. In later sections of this chapter several such problems 


TABLE 13.2 Scores for Ten Cases Selected at Random 
from Table 13.1 and Certain Statistics Derived from Those Scores 


X: Y Р 1 
Xi Arti- Mid- d ү-ў E 
Ce Reading боа] Gm Ld pagg 0200 
FE tent value 
guage 
32 39 58 49 52.34 — 3.34 2564.66 
16 32 43 52 50.39 1.61 2620.28 
66 32 32 50 48.93 1.07 2446.50. 
46 40 54 52 51.80 20 2693.60 
22 32 55 62 51.98 10.02 3222.76 
klai 16 49 52 51.29 71 2667.08 
28 22 40 47 50.06 — 3.06 2352.82 
47 46 53 62 51.63 10.37 3201.06 
34 38 51 40 51.41 — 1141 2056.40 
37 44 57 46 52.17 — 6.13 2399.82 
Sum 341 492 512 512.00 0 26224.98 
Sum of 12429 24838 26626 26224.95 400.99 GL. 
Squares 
2 
Emy 11628.1 24206.4 26214.4 262144 0 “ЕДЕ: 
Sum of 800.9 631.6 411.6 10.55 400.99 ote = 
squares 
of devi- 
ations 


ХХІХ, = 17130 ти = .496 Xı = 34.1 s? = 88.99 а = 9.48 

ZXıY = 17501 Ty = .073 X: = 49.2 82 = 70.18 & = 8.38 

Х,У = 25272 Tay = 160 Y =512 8,2 = 45.73 8, = 6.76 

Ty Туу 
1= №. 

Ty2 — Tyf: 

ол = EET 165 бле 164% = .134 


—_——e——————— 


oy = 


= — 003 = baa = - .008 = — 100604 


will be considered. The symbol 8 is in fairly general use for the 
value here denoted as b*, having been used before there was any 
general recognition of the desirability of using different symbols 
for population parameters and sample statistics. In this text 8* 
represents the population value corresponding to the sample value 
b* and В the population value corresponding to b. 
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Effectiveness of Prediction by a Multiple Regression Equation. 
A multiple regression equation may involve a large number of 
terms and a great deal of computation and still be relatively 
useless. Two different questions have probably. occurred to the 
reader before now: (1) How effective was this equation in estimat- 
ing midterm scores for these ten students? (2) If applied to 
another random sample out of the 98 students or if applied to 
another class another term, how effective would the predictions be? 

Unless the equation is effective with the group on which it was 
obtained it cannot be expected to be useful with another group. 
However it may give very close estimations for the group on which 
it was obtained and be much less effective with a new group. The 
two issues must be considered separately. 

Intuitively one might say that a regression equation gives a 
satisfactory estimation for the group on which it was obtained if 
there is a high correlation between observed and estimated scores, 
or if most of the variation of observed scores can be ascribed to 
regression and very little to errors of estimate. These two inter- 
pictations will now be considered in relation to the data of Table 
13.2 and will be seen to be really the same interpretation. 

Multiple Correlation. The coefficient of correlation between 
observed scores on some trait and scores predicted for that trait 
by a multiple regression equation is called a coefficient of multiple 
correlation or a multiple correlation coeficient. For the data of Table 
13.2 this would be the correlation between scores in the columns 
Y and f. From the final column in that table 2YY = 26224.98. 
Since the mean of Ў is the same as Y, 


Zyj = 26224.98 — (512)?/10 = 10.58 
and : 
zYY -(ZY)?/N 10.58 . 


ғ = 877) vensmssc 


The symbol for this multiple correlation will be written Ry... The 
single primary subscript standing to the left of the dot names the 
variable whose observed and estimated scores are being correlated. 
The two secondary subscripts to the right of the dot name the 
variables used as predictors in the regression equation. Other 
symbols sometimes used with the same meaning as R,.ı are 


Ryan, Тул Туш, Tous and Еола. 
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The direct computation described above was -employed only 
for the purpose of making vivid to the student the concept of 
what ‘a multiple r means. Several formulas algebraically equiv- _ 
alent to that process are available for obtaining Ry.ı2 more econom- 
ically. Of these one of the most convenient to use is 


(13.6) ЕЁ,» = Vryb* + Tab ea 

Substitution of ти = .073, те = .160, b*,5- —.0084, and 

b* 21 = -165 as already found, yields А, = V.0256 = .16 which | 

agrees with the result previously obtained by more laborious com- | 

putation. І : : 
Partition of the Sum of Squares. Reference to the lower 

portion of Table 13.2 shows the following sums of squares: 


Sum of squaresof regressed values about Y, Z(Y — Y)? = 10.555 

Sum of squares of residual errors, Z(Y — Y): = 400.995 
411.550 

Total = Sum of squares of scores about Y, .Z(Y — Y) = 411.6 


This partition of the total sum of squares is reminiscent of ] 
similar relations encountered in Chapters 9 and 10. It may be 
deseribed by the formula 


(13.7) Z(Y- Y) = R,5Z(Y - Y) + (1 - R34)Z(Y - Y)? 
For these data 


Z(Y — Y)? = (.0256)(411.6) + (1 — .0256) (411.6) 
- 10.54 4- 401.06 


Except for rounding errors, the result of substituting 2,1 and 
Z(Y — Y)? in Formula (13.7) agrees with the sums of squares read 
from Table 13.2. The partition of the sum of squares may now 
be interpreted in two ways. 
A. Interpretation as to effectiveness of prediction for observed 
group. From Formula (13.7) it appears that for this group 
Б?л = proportion of the sum of squares of midterm marks 
which can be ascribed to variation in prognostic 
score 
= proportion of variation (measured as sum of 
squares) of midterm marks which might be elimi- 
nated if all cases were selected to have the same 
prognostic test score - 
1 — R*,, = proportion of variation in midterm marks which 
is independent of variation in prognostic test 
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scores and so must be ascribed to other sources 
of variation 

= proportion of variation in midterm marks which 
would remain even in a group uniform as to 
prognostic test score. 

The question of whether the regression equation does or does 
not provide effective prediction may now be answered in terms 
of a value judgment as to how much unexplained variation can 
be tolerated in the particular circumstances in which one is work- 
ing. Anyone who thinks realistically about achievement of 
students on a college course would think of many sources of 
variation unrelated to scores on any prognostic test, as, for 
example, differences in motivation and interest, in study habits, 
in time available for study, in other abilities such as mathematical 
background and general level of intelligence not measured by 
these tests, in personal vicissitudes during the term. He would 
also think of sources of measurement error affecting scores on the 
criterion and on the prognostic tests such as differences in pre- 
vious experience of the student with such tests, his state of mind 
and body on the day the tests were given, the brevity of the tests 
and the particular selection of material in them, and so on. All 
such sources of error contribute to the proportion of variation 
1 — R?,,5. The person using the regression equation must decide 
whether the reduction in variation represented as R®,.ı» represents 
an increase of information sufficient to justify the time required 
for giving the tests, scoring them, and analyzing results. For 
the present data of course it must be presumed that inclusion of 
the third prognostic test in the regression equation, as is done 
later in this chapter, will improve the predictions. For the ten 
cases considered here, the proportional reduction in variance of 
.0256 is too slight to be worth the trouble of giving and scoring 
the tests. However, it will presently be seen that for 98 cases 
the multiple correlation is much higher, in fact .66. This sample, 
which was actually drawn at random, happened to be an extreme 
deviate. 

B. Interpretation in terms of sampling significance. The par- 
tition of the sum of squares leads to a test of significance for the 


multiple correlation coefficient. The sum of squares of regressed 
УР = Р? _ Кылы, is distributed as x? with 


values EET LT 
2 degrees of freedom. The sum of squares of residual errors 
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= = (1— R33) BE is distributed as x? with 


N —3 degrees of freedom. The two sums of squares are inde- 


pendently distributed. Under the deu that py... = 0, the 
D- iy nene Y- m 
2 


two ratios are estimates of the same 


variance o. Therefore the tds 

ii e Y) /(1 - R)Z(Y - pa R NS 
N-3 1-Р 2 

has the F лы e with n, = 2 and лз = N – 3. This expres- 

sion gives a test of the hypothesis that in the population the 

multiple correlation coefficient is zero. 


For the 10 cases of Table 13.2, F = 


(13.8) F= 


0256 7 
.9744 2 
apparent from interpretation A that the two tests under consider- 
ation — difficult reading and artificial language — did not furnish 
a usable prediction of midterm mark for the 10 cases observed and 
it is apparent by interpretation B that the sample of 10 cases does 
not contradict the hypothesis that in the population there is no 
correlation between midterm mark and a weighted combination 
of the two tests. The size of the sample affects the second inter- 
pretation but not the first. 

If there had been k predictor variables instead of 2 the ratio 

R N-k-1 
IRF k 
would be distributed as F with n; = k and m = N — k — 1. 

TLe Normal Equations. In order to generalize the regression 
equation with two predictors to a regression equation with any 
number of predietors it will be helpful to explore the rationale 
more carefully. Such an equation with 5 predictors may be writ- 
ten as 


(18.10) Viens = Ay ros + bys oss Xi Da auXs + Ба ахо 
+ Бало X4 + bus лон Xs 


(13.11) when 4,4» = Y — p m БОНУ рҮ, 
EH Dya. as X4 = bus 14X5 


The pattern of subscripts is clear and can be easily adapted to 
any number of variables. As before, A and all the b’s are unknown 
and must be computed from the group data, while all the X's are 


= .09. It is 


(13.9) Ё = 
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known but vary from individual to individual. The criterion for 
obtaining the b's is that the sum of the squares of the errors of 
estimate, that is Z(Y — Y) shall be made as small as possible. 
If values of the b's are chosen so as to make this residual sum of 
squares a minimum those same b's will make the correlation 
between observed Y and estimated Y a maximum. Students 
familiar with the caleulus will know the eustomary mathematical 
procedure for making the sum of squares a minimum. Others 
must take the end product for granted. That end product is a set 
of simultaneous linear equations which are called normal equations. 
(The term “normal” here has no reference whatever to the normal 
curve.) There are as many such equations in one problem as there 
are unknown b's. For a 5-predictor problem the set of normal 
equations is 

(13.12) 

D*yı 2346 + Тобул asas + Tis D" ya asas + Trub ans Tub ass = Ти 
Tubus б%ли5 + ToU ya amas + Тибулл + Tab“ улаи = Ту? 
Tib л oss + ToU nass + Dt usas + тыб 4.1235 + Tas D us osa = Туз 
Tib аз + Tob a aas + a40 ys 2 + b* 41235 + TasD 5.1934 = 7и 
Tisb* ул 2245 + Tas D ya лав F F350 stus + Tas D a aas H b* 5.1234 = Туу 


It should be noted that the unknowns in these equations are Бев 
but the b’s can be obtained from them by multiplying by the ratio 
of standard deviations, 

Sy 


(18.18) булв = b*ys.1235 mm 


and similarly for the other b’s. All r’s are known, having been 
computed from the observed data. The correlation of every 
variable with every other variable is required. 

Numerical values for the 6*’s can be obtained either by solving 
the normal equations symbolically and substituting the known 
values of the 7’s in the symbolic solutions or by substituting the 
numerical values of the 7’s directly in the normal equations and 
solving the resulting equations for the b*’s. The former method 
produces a set of formulas for the b*’s, and when there are only 
two or three predictor variables it is the simpler method to use. 
It has already been illustrated for the data of Table 13.2. When 
there are many predictor variables the formulas for the b*’s 
become complicated and substitution in them involves a large 
number of steps. The solution can then be obtained much more 
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economically by substituting the r’s in the normal equations and 
employing an efficient routine for solution of the resulting numer- 
‘ical equations. Such a routine will be developed in later sections 
of this chapter. 
No matter which method is employed for obtaining the numer- 
ical values of the b*'s, the multiple correlation can be found by 
the formula 


(13.14) 


a ы... ч 
Ry.12345 = Vib yı 2345 + Fab" ie yas + Tui „з + туф лэм + Tysb* y5 1234 
This formula is readily generalized {о any number of variables. 


EXERCISE 13.1 


1. For the cases in Table 13.1 the data of Table 13.3 were obtained, 
By substitution in Formulas (13.1), (13.2), and (13.3) obtain the regression 
equation to estimate midterm test scores from scores on the two prognostic 
tests. Compare it with the equation obtained from the 10 cases of Table 
13.2. 

2. Ву substitution in Formula (13.6) obtain the multiple correlation 
Ё,лз and compare it with the value obtained from the 10 cases of Table 13.2. 

3. Test the hypothesis that p,.12=0. Why is that hypothesis so much 
less credible in the light of the data from all 98 cases than in the light 
of the data from 10 cases? 

4. Suppose a criterion variable Y is to be estimated from three predictor 
variables named Хз, X, and X;. (a) Write a regression equation similar 
to equation (13.10) with subscripts properly placed. Note that the order 
in which the three predictors are arranged is unimportant and that there- 
fore the order of secondary subscripts is unimportant. (b) Write the three 
normal equations similar to those in Formula (13. 12). (c) Write a formula 
for E, similar to Formula (13. 14) 

5. If a criterion trait Y is to be estimated from two other traits X, 
and Xs, write out with appropriate subseripts the formulas for 


(а) D*y4.6 (e) A — constant term in the regression equation 
(b) D*ys. (©) Ra 

(c) bys (g) The normal equations 

(d) by. 


6. If a criterion trait X; is to be estimated from traits X; and X;, write 
out with appropriate subscripts the formulas for 


(a) b*; (c) bi; (е) A (g) The 3 normal equations 
(b) bins (4) bins (f) Ris 


The Doolittle Method of Solving the Normal Equations by 
Successive Elimination. If there are more than two predictors 
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in the regression equation, it is a laborious process to compute 
the regression coefficients by substituting the numerical values 
of the 7’s in formulas obtained by solving the normal equations 
symbolically, as was done in the preceding section for the case of 
two predictors. With more than two predictors it is easier to 
substitute the numerical values of the r’s in the normal equations 
themselves, and then to solve the resulting equations. There are 
several commonly used methods for solving first degree equations, 
and the normal equations are of first degree. One of the most 
economical of such methods, is named for M. Н. Doolittle. He im- 
proved on the method first used by Gauss and published his method 
in the Report for 1878 of the U.S. Coast and Geodetic Survey. 

This method simplifies computation by taking advantage of 
certain symmetry relations among the correlations namely that 

Ti = Ти, Tis = Tu, ete. 

The Doolittle Method will be developed in some detail for the 
case in which there are only three predictors in order that the 
student may understand its rationale. For this purpose we may 
use the three prognostic tests of Table 13.1 as a basis for predicting 
midterm test scores. "The pertinent data for all 98 cases are 
presented in Table 13.3. The three normal equations are 

Бл эз + Tib as + Тр = Тл 
(13.15) Tib* y аз + b* yeas + To3b* узла = fy 
Tisb* ios + зраз + Жм = Ty 


TABLE 13.3 Correlations of Midterm Marks and Scores on Three Prognostic 
Tests for Students in a First Course in Statistical Method. 


Correlation 
Test Mean 8 X X X; K 
Xı Difficult reading 35.69 7.68 321 АП. .357 
X: Artificial language 46.43 11.03 321 .539 .620 
X, Arithmetic-Algebra 31.38 7.28 .477 — 539 518 
Y Midterm 52.46 9.33 357  .620 .518 


When the computed values of the r's are substituted in Equations 
(13.15) and the results are simplified by writing — u = OF P 
— ш = b*41 and — us = b*,a.ı4, and both sides of the resulting 
equations are multiplied by —1, we have the new set of equations 
(1) ш + B21 + .477из = — .357 
(18.16) (2) .32lu + w+ .539us = — .620 
(3) .477u, + .539и + из = — .518 
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The methods of solution of simultaneous equations -studied i 
high school algebra are applicable, but unless steps are set down 
in an orderly routine there will be much waste effort. The student 
should study carefully the routine set forth in Table 13.4 until 


TABLE 13.4 Detailed Solution of Equations (13.16) 


THE FORWARD SOLUTION 


Line . Step 

[1.1] Write equation 1 ш + .321 м + 477 us = — .357 
[12] [1.1)-(- 1) =ош ¬ .821 ш — 477u, = 357 
[2.1] Write equation 2 321 ш + из + .539 us = — .620 
[22]  [1.1]-(-.321) — 321 ш — .103 u: — .153 us = 115 
[23] [2.1] + [22] 0 + .897 и. + .386 us = — .505 
[24] [2.3] + (- .897) 0 -= us — .430 и: = .563 
[3.1] Write equation 3 477 и, + 539 u: + и; = — 518 
[32] [11]-(— 477) = ,477 u — 153 u: — 228 u = —.170 
[3.3] [2.3]: (— .430) 0 —.3864,—.100u - 217 
[34] [3.1] + [3.2] + [3.3] 0 + 0 + .606 u = —.131 
[35] [3.4] + (— .606) 0 + 0 = u = 216 


THE BACK SOLUTION 


[1.12] Enter [3.5] - l= .216 
[2.18] Enter [2.4] = из — .430 и = .503 
[225] [1.1B]-(— .430) 430 и, = — .093 
[23B] [2.18] + [2.28] = “m+ 0 = 470 
[3.18] Enter [1.2] — ب‎ ¬ 321 uz — .477 и = 857 
[3.2B] [1.18]-(— 477) 477 us = — 
[3.88] [2.3B]-(—.321) 3214-- 0 =- 
[3.4B] [3.18] + [3.2] + [3.38] — we OF 4.0 = 


he understands it fully and then should compare it with Table 13.5 
so that he may understand how to generalize that abbreviated 
solution to a pattern accommodating any number of predictor 
variables. In Table 13.5 letters representing variables ш, uz 
and us have been omitted and only their coefficients printed. 
Certain unnecessary terms have also been omitted as well as signs 
of operation and equality. 

The solution proceeds in cycles, there being as many cycles 
as there are unknowns, and one unknown being eliminated in each 
cycle. The lines have been numbered in such a way as to indicate 
both the cycle and the row within the cycle, Thus [3.2] indicates 
the 2d row in cycle 3. 
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The forward solution proceeds by successive elimination of 
one unknown after another until a value is found for one of the 
unknowns. Then the back solution consists of taking the values 


TABLE 13.5 Abbreviated Solution of Equations (13.16) 


THE FORWARD SOLUTION 


Line Step [x EA Xi Y | Check 


[1.1] Insert 71's 1 .321 ATT | —.357 1.441 
[1.2] 1.1: (— 1) —-1| —321| —477| .357 | —1441 
[2.1] Insert r2;’s 321 1.000 | , .539 | —.620 1,240 

2.2] [1.1]: (—.821) —.103| —.153 115 |  —.403 
[2.3] (2.1]2-[2.2] 897 386 | —.505 1777 
[2.4] [2.3]7- (—.897) —1.000 | —.430 .503 | —.866 
[3.1] Insert 75;'s ATT .539 1.000 | —.518 1.498 
[3.2] [1.1]: (—.477) —.228 170] —.687 
[3.3] (2.3]-(—.430) —.166 217 | —.334 
[3.4] (3.1]+[3.2]+[3.3] 606 | —.131 ATE 
[3.5] [3.4] 7- ( -.606) —1.000| .216| —.787 


THE BACK SOLUTION 


[1.1B] Enter [3.5] 
[2.18] Enter [2.4] 
[2.28] [1 18]- (—.430) 

[2.3] [2.18]4-2.2B 

[3.15] Enter [1.2] — 1.000 
[3.28] (1.18]- (—.477) 

[3.38] [2.3B8]-(—.321) 

[3.48] [3.18]-- [3.28 ]-[3.38]| — 1.000 


Ь*% лл» = .216, b* yeas = 470, ر"‎ = .102 
Ryan = Vid ys + Ty as rab" yss = 663 


— 430 563 | —.866 
430 | —.093 :338 


— ATT 357 | — 1.441 
477 | —.103 875 


of those unknowns which have already been found and substitut- 
ing them in equations to obtain values of the remaining unknowns. 

Each cycle may be thought of as having а first line, a last line, 
a next-to-the-last line, and a set of “other” lines. (The first 
cycle is an exception, having only 2 lines.) The first step in each 
cycle of the forward solution is to record the values of the appropri- 
ate r's. In the third cycle these are the correlations of X; with 
the other variables, and similarly in a problem involving many 
predictors the correlations with X+ are recorded on ihe first line 
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of cycle k. The column headed X; receives the correlations of X, 
with other variables and in general in a problem in which there 
are many predictors the column headed X; receives correlations 
of X, with other variables. The criterion variable is placed at 
the right of the predictors. This step is equivalent to recording 
the original equations in the detailed solution. 

Following the line on which 7’s are entered, each cycle has one 
line obtained from each of the preceding cycles, as worked out in 
detail in Table 13.6 for cycle 5. Thus in a problem with many 


TABLE 13.6 Detailed Steps for Cycle 5 in Forward Solution 


Enter the correlation of X; with each of the other variables, placing rse 
in column X;, ть; in column X;, etc. Enter the sum of all these correla- 
tions in the check column. 

[5.2] Multiply line [1.1] in cycle 1 by the entry in column X; and line [1.2]. 
Do not record any values to the left of column Х,. Multiply the entry 
in check column of [1.1] by same multiplier from line [1.2] and enter 
result in check column of [5.2]. 

[5.8] Multiply line[2.3] in cycle 2 (including the entry in check column) by the 
entry in column X; and line [2.4], but do not record any values to the left 
of column Xs. 

[5.4] Multiply line [3.4] in cycle 3 (including the entry in check column) by 
the entry in column X; and line [3.5]. 

[5.5] Multiply line [4.5] in cycle 4 (including the entry in check column) by 
the entry in column X; and line [4.6]. 

[5.6] Add lines [5.1] to [5.5]. 

[5.7] Divide line [5.6] by the first entry in that line, that is, the entry in column 
X;, after having first changed the sign of the divisor. The first entry in 

- [5.7] should be — 1.000. The entry now standing in the check column 

should be equal to the sum of the other entries in line [5.7]. 


predictors, cycle k would have k — 1 such lines. Each of these 
lines is obtained by multiplying the entries in the next-to-the-last 
line in one of the preceding cycles by an entry in the last line of 
that cycle. Thus line [1.1] is multiplied by an entry in [1.2] and 
line [2.3] by an entry in [2.4]. In a problem with many predictors 
this would be continued until all the preceding cycles had been 
covered. Each of the multiplying numbers is taken from column 
X, when one is working in cycle k and entries to the left of column 
X. are not recorded. This is the equivalent of multiplying the 
coefficients of the unknowns in a particular equation by a pertinent 


or helpful constant so that when equations are added one unknown 
will disappear. 
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The next-to-the-last line in each cycle is the sum of all the pre- 
ceding lines in that сусе. Examination of the detailed solution will 
show that at this step oneunknown is eliminated from the equations. 

The last line in-each cycle is obtained by multiplying each 
entry in the preceding line by the negative reciprocal of the first 
entry. This step makes the coefficient of the leading term —1, as 
it is in the detailed solution. 

In the back solution the first line of each cycle is the last line of я 
cycle in the forward solution, the cycles being taken in reverse order. 

Checks. Substitution of the computed values of the standard 
regression coefficients, the b*’s, in the normal equations provides 
check on all b*'s at once. For the data under consideration sub- 
stitution in the equations 

b*yı.23 + Tib as + Tb * as = Ту 
rıb*yı.2 + OF yeas + Tas уз д2 = Туз 
rıab*yı.2 + Tab yas ++ tas = Ms 
would yield the following check: 

1092 + (.821)(.470) + (.477)(.216) = .3559 while rj = .357 
(.321)(.102) + (.470) + (.589)(.216) = .6192 while ry; = .620 
(.477)(.102) + (.539) (.470) + (.216) = .5180 while туз = .518 

Most experienced computers like to check their work step by 
step, so that if a mistake is made it may be caught immediately. 
For this purpose a check column is added at the right of the work- 
sheet. For the first line in each cycle, the entry in this column is 
the sum of the r's entered on that line. All other entries in the 
column are obtained by the same set of directions which apply to 
the other columns. The check consists of ascertaining that the 
entry in the check column is equal to the sum of the other entries 
for the last line and for the next-to-the-last line in each cycle. 
In general, that relation will not hold for the other lines in the 
forward solution because of the omission of certain terms. It 
holds for all lines in the back solution. Even if the check column 
indicates no error, the check by substitution in the normal equa- 
tions should be made because the entries in the check column do 
not provide an infallible check. 

Fisher Modification of the Doolittle Method. The procedures 
described in the preceding section serve to obtain a multiple 
regression equation, and a multiple correlation coefficient and to 
test the significance of that correlation coefficient. To test the 
significance of the b*'s in the regression equation another approach 
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is necessary. The method to be described in this section will 
provide the standard error of a coefficient of partial regression and 
will thus make it possible (1) to test hypotheses about such 
coefficients, (2) to make an interval estimate of the population 
regression coefficient, and (3) to predict several different criterion 
variables from the same set of predictors. 

The discussion will be carried out for the case of three predictors 
and illustrated by the data already used to predict midterm test 
scores from three prognostic tests. Here we shall (1) test hypoth- 
eses concerning the regression coefficients in the population, 
(2) make interval estimates concerning them, and (3) obtain two 
regression equations from the same predictors, one with midterm 
test scores as criterion and one with final examination grades as 
criterion. For these purposes certain supplementary values are 
needed. 

Two explanations will now be offered. Explanation A should 
be helpful for persons who have studied matrix algebra and 
incomprehensible to most others. The latter need not look at 
it but may go on at once to explanation B. Persons who can read 
explanation A will not need explanation B. Those who cannot 
read explanation A will probably have to take the routine more 
or less on faith and need not strain to understand the underlying 
rationale. 


А. Explanation in terms of matrices. The normal Equations 
(18.15) may be written in matrix notation as 


l т ts Б лз Ty 
(13.17) Ti? 1 Tos b* as ái Ty 
Tis Ta 1 Б^ Tys 


. When expanded, this becomes Equations (13.21) of explanation B. 


If both sides are premultiplied by the inverse of the matrix of r's, 
we have 


(13.18) 
тат (1 Fe Tu Бл зз І re ris) -ı ^u) 
Tm 1 т T2 1 т b*e Р = Tu 1 т Ty 
Ti Тз 1 Tis Тз 1 b* sas En ا‎ E 
Hence 


ЖӨ b* y. Cu Cis Cis Ty 
(13.19) On Ett) Балар = 4. Cn Сы Са Ty 
ШӨ | * 3.12 Cu Сз Cs Tys 
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When expanded (13.19) becomes Equations (13.22) of explanation 
B. The goal now is to obtain the values of the C’s which are the 
elements of the inverse matrix: The product of the matrix of 
r’s by-its inverse, the matrix of C's is the identity matrix 
1 Tı Ti Cu C» Cu 190570 
(13.20) [^ 1 | [e Cn A = fo 1 o} 
rı Ta 1 Сы C» Css 0:071 
When expanded (13.20) becomes Equation (13.23) of explanation 
B. There are nine equations in (13.20) and 9 unknown C's to be 
computed. However, because of the symmetrical nature of the 
matrix of r's, the matrix of C's is also symmetrical and Cy = Ca, 
Cis = Cu, Са = Co. The C's which are elements of the inverse 
matrix are sometimes called elements of the inverse solution, or from 
relations (13.19), ‘multipliers or inverse multipliers. They will be 
called multipliers in this book. 
In section C a routine for obtaining these C’s will be developed. 
B. Explanation without matrices. The normal equations 
1-b¥ 1.09 + Tib yeas + Tib as = Ти 
Tisb* ias + 1° b* yeas + Tuby. = Ти 
TisD* 23 + Tab у лз + 1° b* зл = Туз 
might be thought of as equations to obtain the values ту, Тш, Tvs 
from the b*'s. However what is usually needed are the inverse 
expressions for the b*'s in terms of the correlations ту, Ty» Ty 
and the correlations among the predictors ri» Tis, T23- Equations 
(13.21) may be solved for the b*'s but when there are more than 
two predictors the symbolic solutions are complicated. These 
solutions may be indicated by the set of inverse equations 
bryos = Cura + Cura + Cura 
(13.22) bY eas = Сага + Сиг Cura 
е b* a12 = Cara + Cura + Сат 
where each С represents an expression involving the correlations 
т, Tu, Tag. These expressions for the C’s become more and more 
complex as the number of predictor variables increases. Certain 
convenient relations among the C's and r's can be readily estab- 
lished by matrix algebra but will probably have to be taken on 
faith by the person who does not understand that subject. These 
relations are 


(13.21) 


1: Cu + тиби + run = 1 
тиби +1: Са + Tala = 0 
тыб + TaCn +1: Car = 0 
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1- Cy + тС + rijs =0 
(13.23) TCi + 1-С» + тС = 1 
тС + т» С» + 1-С» = 0 


l- Cis gs TCs + Tis Css = 0 
тоСз + 1- С» + raC3 = 0 
тС + T2333 Tl Ca -1 


The nine Equations (13.23) can be compactly written as 


1- Ca + тС + riCa = 1, 0, 0 
(13.24) Tila +1: Со + raCa- 0,1,0 
TiC a + ToC + 1-С» = 0,0, 1 


The C's may be called elements of the inverse solution or from the 
relations (13.23) multipliers, or inverse multipliers. They will be 
called multipliers in this book. 

C. Routine for computing the multipliers. The goal is to obtain 
values of the C’s which may be substituted in Equations (13.22) 
to obtain the b*’s. The computational form is set up in the manner 
already used for the abbreviated solution on page 329 except that 
the column for criterion Y is omitted and in its place are several 
columns labeled Сл, С, Cj. In a problem having m predictor 
variables there would be m such columns. For a 3-predictor 
problem the column labeled C; will eventually lead to the values 
Cy, Cn, and Ci, the column labeled Су will lead to the values 
Съ, С» and С, and the column labeled C; will lead to Crs, Cas 
and Ci. 

The numbers to be entered on the first lines of the forward 
solution are, for m predictors, as follows: 


Line X; Xs X; e. Жы, С, С» Cys eee C» 
[1.1] Tr Ties iii bl 0... 0-...0 
[2.1] Tio 1 Тоз: . o Ta. 0 1 0 Enn 
[3.1] fuse. БЕР fa. 0 0 1 Мб з O 


[т.1] н н wn OO! 02.1 

The succeeding steps are exactly like those of the previous solu- 
tion. Values of the C’s will appear on the last lines of various 
cycles in the columns headed Сл, Cj», C. These have been taken 
out and set down in a separate tabulation where their symmetry 


can be noted. That symmetry provides one check on the compu- 
tations. The b*s are now obtained by substituting the C’s in 


Fisher Modification of the Doolittle Method - 335 


OG ag EA 
86&` PFE 
18L-— 668" 

0 0 


0 0001 — 


Ice 
шӯ 


LLp— | IGE = | 000T— 


OIZ'— OFT 
OIZ'— soe 
0 SILT 


0 000'1— 
ogr 


oek — | 0001— 


0891 OIZ— 


000'1— 


619'— 


62g 
09c — 
188 — 
9IO'T 


£/9'— 


+09" 
996 — 
098° 


86/— 


867 


x949 


от “OTL — 
000'1— OSH 


NOLLYTOS MOVE 


0001 
909° 
99r— 
82 — 
0001 
osr — 


000'1— 
986" 168° 


esr— | 207 — 
629" 000'T 


LN — Tee" — 
Le Teg 


[aee] + 
[gc] + [Care] Lave] 
(Tag —) [aec] Laes] 
Gy —) [ATT] Ege €] 
eru [gre] 
gcc + [arz] Ege] 
(oer —)-Fgrr] Egec] 
[rz] зәзия [gc] 
[e£] seg [arr] 


= —)-Er'] 
Cee] + [ee] + [т] 
(oer —)-Ce'c] 
(ы —)- Err] 
wep ләзия 
(252: riez 
[zc] + [re] 
(16 —) Er T] 
vjvp зә} 
та) 
т}вр 19U 


[9'8] 
Cre] 
Cee] 
[c£] 
[re] 
[rz] 
Cez] 
[22] 
Сге] 
[& 11] 
[гт] 


24 X 


dag oury 


зләцаціпүү ayy Buyndwio> 10j әзпрәзолд apy!oog-1ays!y 


NOLLOTOS GUYMUOT 


ZEL 31971 


Values of the C's Obtained from Fisher-Doolittle Solution 


C; 


Cr 


Line 
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Equations (13.22). These are seen to agree with the previously 
computed values of the b*’s except for small differences in the last 
digit due to rounding errors. The complete solution appears on 
page 335. 

Values of b*'s to Predict Midterm Test Scores 


i Де 3 Previously 

Obtained by substituting C's in (13.22) obtained 
b*yı.ı = 1.305(.357) — .117(.620) — .559(.518) = .104 ‚102 
Б*%алз = — .117(.357) + 1.420(.620) — .710(.518) = .471 470 


b'jsas = — .559(.357) — .710(.620) + 1.650(.518) = .215 216 
bns = изү = 124; ba = 398; bn .277; 


А = F — basXi — ъл — bya.2Xs = 20.86 
Then Y = 20.86 + .124X; + .398Х, + .277X; 

Regression Equation for a Different Criterion and the Same 
Predictors. The same set of C's can now be used to obtain the 
regression equation to predict a different criterion variable, if 
the correlations of that criterion with the predictors are available. 
From the data of Table 13.1, the correlations of the final examina- 
tion with each of the three predictors can be found. Let final 
examination scores be designated Z. Then 

Ta = .825 та = .508 їз = .502 
8, = 9.47 Z = 49.15 
Ь*дзз = 1.805(.325) — .117(.508) — .559(.502) = .084 
b*:2.13 = — .117(.825) + 1.420(.508) — .710(.502) = .327 
б*в л = — .559(.325) — .710(.508) + 1.650(.502) = .286 
Check: b¥ 1.23 + rısb*z2.1 + тыб зл = 8254 and та = .325 
тиб дз + b¥ 2213 + ub*4s = 5081 апа ra= .508 
тыб дз + Tub 12.13 + D*,s.ı = 5023 and ra= .502 
Then Rian = Tab*as + Tab * aas тарал = .3870 
Р, 1з = У .3370 = .58 


To obtain the regression equation we need 


bano btan = 084 547. 104 
bas = 0%. wet = 327 1047 = .281 
bas = Б®,3. m — = .286 87 = .372 

А = = bans ine 2 — ba. „Хз = 20.77 


2 = »: 77 + .104X, + .281Х, + .373X; 
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Tests of Significance for Partial Regression Coefficients. For 
this purpose we shall use a statistical model analogous to the one 
used in Chapter 10 for testing hypotheses about regression coeffi- 
cients in problems with only a single predictor variable. General- 
ization will be made to a subset of samples in which the values of 
the predictors are exactly the same as in the observed sample. 
It will be assumed that values of the criterion variable Y vary 
from sample to sample with mean depending on the predictors 
and described for three predictors by the equation 


(18.25) E( mis а + By Xi + ByisX2 + ВалгХз 


The values of Y fluctuate around their expected value, there 
being a normal distribution of Y’s for each different combination 
of Xi, X» and Хз, and all.such Y distributions having the same 
variance. 

The 8'ѕ appearing in Formula (13.25) are the population param- 
eters for which the b's are sample estimates. We shall use the 
symbol 8* as the population parameter corresponding to the sample 
b*. The hypothesis that 8* = 0 (hence 8 = 0), may be tested 
directly by use of b*. The computation of confidence intervals 
can be made more simply for the 6’s than for the *'s. 

Symbolically the relation between the 8*'s and the b*’s is 
(13.26) 


Buns = E*n), Breas = E*n), and Ваа = Eba) 


Then the statistics 
(13.27) 


b* 1.23 — B" yas х Бел — B" as , and ban- Baa 
y£ — Ry 123) Си (1 = Ry ааз) Cm NI ae 
N-4 N-4 Nus 


all have "Student's" Distribution with N — 4 degrees of freedom. 
If m predictors were used in the equation, N — 4 would be replaced 
by N — m — 1 and the statistic would have N — m — 1 degrees of 
freedom. 

Returning now to the 
from the three prognostic tests, 
each of the three regression coefficient 4 
that it is zero in a hypothetical population of students of which 
these 98 students may be considered a random sample. The 
regression equation has already been obtained in the two forms 


problem of predicting midterm score 
it will be of interest to test (for 
s separately) the hypothesis 
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fon Ki Rast is = 
8, 8] СЯ 
апі Y = .124X, + .398Х, + .277X; + 20.86 
The multiple correlation coefficient related to these equations 
has been found to Бе R,.ı» = .663 and this has been shown to be 
significantly different from zero. The multipliers 
Cu = 1.305, С» = 1.420, Сз = 1.650 

have been found. I 

To test the three null hypotheses the following statistics are 
computed, each distributed as t with N — m — 1 = 98 - 3 – 1 = 34 
degrees of freedom: 


+.47 4-216 85 * X) 


bam-0 _ .102 =117 
(1 — R’) Cn ү e» 005) 
N-4 94 
bn 0 _ ATO Е 
ү = Ё?)С» (.5604) (1.420) 
N-4 94 
br — 0 .216 pue 
ye — R?)Cs (.5604) (1.650) 
N-4 94 


Interval estimates for these regression coefficients may be 
made by the method previously used in making an interval 
estimate for the mean and for other variables having a normal 
distribution. The standard error of bys is 


— р? 
(13.28) sii с & 
To state the formula in more general terms we may for convenience 
drop the secondary subscripts and merely use R for the coefficient 
of multiple correlation and b, for the partial regression coefficient 


of Y on Х,. If there are m predictor variables, the standard error 
of b, is 


13.29 Sy = УО EC. з, 
( ) ^ N-m-1 s 
Then the inequality 


lis < bu = Bye < hye 
Soya 
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will produce an interval estimate for 8, with confidence coefficient 
l-a: 


(13.30) 
1 – R*)C, _ р? 
byr + Ца ACIE < Bur < byr + ls V GZ Cn. 


Now let us apply this procedure to obtain a confidence interval 
for each of the three 8’s in the data under consideration. If a 
95% confidence interval is decided upon, о = .05 and ts = 1.96 
may be read from a table of normal probability as there are 94 
degrees of freedom. The interval estimates are: 


—.085 < By.zs « .328 
.245 < Bas < .550 
026 < By. < .526 


Elimination of One Predictor from the Regression Equation. 
Either the test of the null hypothesis or the interval estimate for 
the population values of the regression coefficients would produce 
a lively suspicion that Xi, the reading test, may not be making 
any real contribution to the prediction’ of success in first term 
statistics since the observed 5* = .102 might quite possibly be a 
chance departure from 8* = 0. The other two measures must. be 
considered to be making a significant contribution to the predict- 
tion. Therefore it would be wise to make a search for a more 
useful substitute or, if testing time is at a premium, to consider 
omitting it altogether. (A passing comment might be in order, 
however, to the effect that while the reading test has made very 
little contribution to the general prediction of marks in the 
statistics course, individual students have often expressed appreci- 
ation of the opportunity to discover a weakness by its aid. Some 
students who have acquired the habit of reading very rapidly 
need to learn that in this subject one must read more deliberately, 
must weigh and compare and reread and reflect.) 

How much would the predictive value of the regression equa- 
tion for these 98 students be reduced by dropping variable X;? 
For three predictors, the error sum of squares is 

(1 — R3,45)Z(Y — T 
and for two predictors it will be (1— E*,5))Z(Y = Y). The 
relation between these is 
(b* 1.23)? 


(13.31) 1- Ёа = 1 - Ryan + 
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or, in general, if there are m predictors and predictor X; is dropped 
from the equation, the sum of squares for residual errors is in- 


creased by e (6% л12...т) (У — У)? 
For X; of this problem 
ELS d +4 
©, Ota? = pags (-102)* = .008 


A relative increase of only .008E(Y — Y)? in the error sum of 
squares appears negligible. 


Then 1 — Ry. = 1 — (.663)? + .008 = .568 
‚ and Ry 23 = V.432 = .657 


Some reasons have already been suggested as to why prediction 
of achievement by means of prognostic tests can never be perfect. 
In these data 43% of the variance of midterm scores is related to 
varjation in the two tests X» and X;, 57% to variation in other 
traits. This may seem inconsequential to persons who have not 
worked with such data, but few persons who have tried to predict 
achievement scores would consider the gain in knowledge a small 
one. In interpreting the multiple correlation it should be under- 
stood that the classes were conducted in a manner designed to 
destroy the validity of the prediction. The students with the best 
prognosis were in a larger class and one which studied topics not 
covered by the other sections and not included in the class tests. 
Some of the students for whom prognosis was poor were in a very 
small class where they had more opportunity for individual 
attention, others were in a section which met four hours a week 
whereas those with better prognosis were in a large class which 
met three hours a week. 

Partial Correlation. Sometimes the computed coefficient 
of correlation between two variables is misleading because there 
is little or no intrinsic correlation between them beyond what is 
induced by their common dependence upon a third variable (or 
upon several others). One might make a very long list of traits 
which increase with age between 12 and 18 years of age, such as 
height, weight, strength of grip, vocabulary, interest in public 
affairs, money spent on wearing apparel, size of shoe, knowledge 
of public affairs, interest in the opposite sex, ability to understand 
abstract material, etc. Any two of these would almost certainly 
show positive correlation, but the correlation between money 


| 
| 
| 
| 
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spent on wearing apparel and ability to understand abstract 
material would probably disappear if the effect of variation in 
age could be eliminated. In his Statistical Method, S. Florence 


remarks: 


The contention that most people die in their beds and that therefore, a bed 
is the most unhealthy of places is a familiar example of failure to apply 
partial association. Going to bed is a usual consequence of being ill and 
dying occurs after a period of illness. Both phenomena are normal results 
of jllness and to obtain а scientific proof it would be necessary to select 
only cases of illness and within the universe of illnesses to see whether 
those going to bed really die in greater proportion than those not going 
to bed.* 


The data on the prediction of success in first term statistics 
do not display the striking contrasts which sometimes appear 
between zero order correlations and partial correlations. How- 
ever they may be utilized to demonstrate what the formula for 
partial correlation means. 

For the 98 cases of Table 13.1 the correlation between scores 
on the midterm test Y and scores on the arithmetic-algebra 
placement test was ry = .52. Is there a relation between these 
two traits which cannot be explained in terms of their common 
relationship to scores on the artificial language test? 

Table 13.8 shows the scores on these three tests for a random 
sample of 10 cases out of Table 13.1. It also shows the predictions 
which would be made for Y and for X; by the regression equations 


Y = by Xa + (Y TI ba X») 
Be = bX: + (X; = Хз) 
and shows the errors of estimate Y — Y and X; = X. 
The correlation between these errors of estimate is 
ne... Z2X-D0a-X) _ 116.29 _ 5 
y-Y, x; VE — PZZ: - X 381.65 


This correlation, which has sometimes been called a net correlation, 
із ће partial correlation denoted by the symbol Ty 3 
If both X, and X; has been used in the regression equation, the 


correlation between х 
Y = P= У-У - X) Бал - X2 
апа Хз – X, = X: — X; – bs 2( X1 — X) – bsa(Xs - X) 
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could have been found to be r1: = .08 
- Aya = 2 
та = AT < туал = .08 
Ty3.2 = „3 


TABLE 13.8 Correlation among Residual Errors for Ten Cases 
Chosen at Random from Table 13.1 


Сако Cala MET YT Y  X3-3, Y-Y (%,—X,)(y-f) 
1 53 28 62 35.8 568 —78 5.2 — 40.56 
Toile OD ROO аго BOSE I6 11.7 —18.72 
32 ESL 9 4 ЗВ 150.52 6.6 „—105 58.80 
19 058798 DO GE 2299 15 — —79 —11.85 
7) B7 743 0/84 — 97.2 ҮК К УГА 5.0 24.00 
88 58 42 59 37.6 59.5 44 —.5 —2.20 
47 58 3: 60 358 568 12 52 6.24 
ТВОЕ: О 64g) epa dac TOO "47 ^ 104 48.88 
97 34 33 41 29.2 46.5 3.8 —5.5 —20.90 
2-52 30 48 355 562 -5.5 —132 72.60 

Sum 521 355 563 355.1 5631 —01 -01 116.3 

Mean 521 35.5 563 35.51 5631 —.01 —.01 

X(dev)* 616.9 284.5 8841 756 1814 207.6 7015 

faved ы E NTE OS ай 
Z(X,— XyVx(Y — Y) — v(207.6)(701.5) 
паа 2 Та fm ا4ے‎ - (AB(82) _ ap 
“e VI-TuVI- ris (.893) (.854) ; 


Ip E АУ ee ET 00s 
From these figures we note that variation in reading score and 
variation in artificial language score both have a considerable 
effect on the relation between algebra-arithmetic score and mid- 
term test. For corresponding data for the 98 cases, see Exercise 
13.2, question 6. 

The subscripts preceding the point are called primary sub- 
seripts; those following the point, secondary subscripts. The 
number of secondary subscripts is the order. Thus Ty. is а 
second-order partial; rj, and Ty.» are first-order partials; r,s is 
а zero-order coefficient. 

Any partial correlation coefficient can be obtained from 


coefficients of the next lower order. Thus 
(13.32) 


Tus — Туз Туз — Туз? 


Tai = ———É—— _ and Тм = — ا‎ 
Vl-ra vira N= A/a 
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If additional secondary subscripts are annexed to every rin 
Formula (13.32), the result is the formula for a partial of higher 
order. Thus 

Foca Туз — Tyata2 
ES МІ пау rus 


Тузл — Ту 1732.1 


Тм. = — ا‎ 
к V1 — Fa. V1 — F21 


Since the arrangement of secondary subscripts is immaterial, 
Ty? = Туз. The two Formulas (13.33) are algebraically identical 
but seldom give identical results in computation because rounding 
errors take a heavy toll. To obtain a partial correlation with a 
large number of secondary subscripts it is possible to work up step 
by step from zero-order coefficients to first order, to second order, 
etc. but the amount of labor involved increases rapidly as the order 
increases. If the secondary subscripts are numerous it is usually 
easier to use the Formula 


(13.84) 712.3456 7 D*12.3456D* 21.3456 = bio saseD21 2456 


obtaining each of the b*'s by a Doolittle solution. 

The study of partial correlation suggests one reason why 
correlation coefficients are difficult to interpret. There is no rule 
which the research worker can follow to tell him whether he needs 
a zero-order correlation or a partial. He is obliged to answer that 
question by intense and critical thinking about his data. Some- 
times the zero-order r is really spurious, sometimes the partial, 
depending on the use to be made of it. The zero-order correlation 
of intelligence quotient with reading ability for all the children 
in a school system will be positive and is likely to be fairly high. 
However, if computed for children in the fourth grade only the 
correlation between intelligence quotient and reading ability is 
likely to be negative because within the single grade the brighter 
children are the younger and in the fourth grade the older children 
have a certain advantage in reading. Here the variable of “grade 
in school” has been literally held constant. The effect on the 
correlation coefficient is similar to, but not identical with, the 
effect of taking a partial in which a secondary subscript denotes 
“grade in school.” d 

Test of Significance for a Partial Correlation. All the tests 
of significance for a zero-order correlation described in Chapter 10 
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may be applied to a partial if the number of degrees of freedo 
is decreased by 1 for each secondary subscript. 

Relation of Partial and Multiple Correlation Coeffici 
Using а 4-predictor problem as illustration, the multiple R ma 
be written in terms of partial r’s and a zero-order r as follows: 

1 — Вулин = (1 = 755)(1 — ryan) (1 — т) (1 — 7542) 
(13.35) = (1 = rà)(1 — rà3)(1 — ra) (1 — ryan) 
= (1 = ry) (1 — r3)(1 — Fı) (1 — rias) 
ete. 
This formula does not provide a very economical compu 
routine but it furnishes some interesting comparisons which enable 
us to set a lower bound for a multiple R without computing it, 
Each of the parentheses on the right of Formula (13.35) contains 
а quantity which is smaller than 1 (except in the unusual 
of r = 0). The product of several such quantities is smaller thai 
any one of them. "Therefore 
1 — Ram < 1 = ran 
and Ryu > ryan 
The right-hand member of (13.35) could be written in 4! = 24 
different ways, all algebraically identical, if the various partials 
are expressed in terms of zero-order r's. These 24 ways of writing 
the formula would involve every r having y as one of its primary 
subscripts, and each of these r’s would be smaller than Ry awe 
However, there is no necessary relation between R,.ı and other 
r's for which y is not a primary subscript, as for example, Tes.) 
OF Tyr. 

Consistency Among Coefficients. The relations between zero- 
order and partial coefficients are less definite than that which 
provides a lower bound for the multiple correlation. Occasionally 
а research worker encounters a partial coefficient which is larger 
than the zero-order coefficient with the same primary subscripts. 
For example ri = .30, га = .10 and ra = .80 would yield тъз = .37. 
More often in the data with which social scientists deal the partial 
coefficients are smaller than the related zero-order coefficients. 
Occasionally riz and тиз have opposite signs. For example for 
the r's just quoted ry = .10 and таз = —.245. The three coeffi- 
cients ти, та and ль are always related in such a way that none 
of the partials is numerically larger than 1, This is called 
consistency relation and it amounts to the requirement that 


(13.36) А 1 - ra- ru- ورم‎ + 2ryururs > 0 
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Thus ги = .80, та = .60, rz = = .20 are inconsistent because they 
would lead to partials larger than 1, and 
1 ~ rg — rhy — rs + Огт = — 232 

EXERCISE 13.2 

1. Write the formula for each of the following partial-correlation 
coefficients in terms of zero-order coefficients: (a) ras; (b) тат; (с) rn. 

2. Write the formula for rus in terms of first-order partials having 
5 as secondary subscript. 

3. Write the formula for ra.su in terms of third-order partials having 
2, 5, and 6 as secondary subscripts. 

4. Write the formula for ru. in terms of the related regression coeffi- 
cients, 

5. Write the formula for Ёз in the manner of Formula (13.35). 

6. For the data for the 98 cases of Table 13.1, using the zero-order 
correlation coefficients in Table 13.3, verify the following partial coeffi- 


cients: 
тал ™ 424 
Ty = .518 Tan” 218 


fu: 278 


Iterative Method of Obtaining Regression Weights, An it- 
erative method of obtaining the b*'s for a multiple regression 
equation — usually called the Kelley-Salisbury Method *** — will 
be described. It is useful only when some, small errors in the · 
b*'s can be tolerated. When the number of predictor variables - 
is small it does not save labor. Those who like to use iterative 
methods consider that when the predictor variables are numerous, 
this method is quicker than the Doolittle method, However it 
does not provide any method of testing the significance of the 
b*'s individually, It will be illustrated here from the data already 
used for the Doolittle solution. 

The Kelley-Salisbury method is based ой the normal equations. 
For the present data these are 

Бл зз + 321b. + AT7b% н = 857 
,3215*,, „+ рал + ,539b* i = 620 
АТТЫ в + 580%. Ви = 518 mu 

The procedure is to make a rough estimate of the values o! 
the b*’s (called W, on line 2 on page 346), substitute those estimates 
in the normal equations and obtain first approximation to the 
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correlations with the criterion (called т; on line 3). Now, сотраг- 
ing the first approximations 7ı with the actual correlations we see 
that the approximation is close for r, but is considerably too large 
for г and ra. We shall try reducing the estimate for one or the 
other of the corresponding b*'s. Since тл shows the largest 


discrepancy we will reduce the entry for column 1.. It is not neces- | 


sary at this stage to seek for great precision, so we will reduce 
the estimate from .2 to .1 and will record .1 — .2 = —.1 as the 
amount of the change in column 1. It is now possible to substitute 
the new weights .1, .4 and .3 in the normal equations or merely to 
note that the estimate of rj, will be changed by — .1, the estimate 
or ту by — .1(.321) and the estimate of r,s by — .1(.477), producing 
the second estimates т recorded in row 4. The iterative process 
may be carried on until agreement between estimated r's and 
observed is close enough to satisfy the computer. 


Step Estimate Change 
2. Xs X, X: X: 

1. ry; = correlation with criterion .357 .620 .518 

2. Wı = first estimate of b* 2 4 3 

3. ту = first approximation {от .4715 .6259 .6110 —.10 

4.1% 8715 .5938 .5033 —.05 
$5.7 33476 .5669 .5133 1.05 

6. rı 3637 .6169 .5403 

7. W: = revised estimate of b* 10 45 25 

8. r4 (check) .3637 .6169 .5403 — 022 
9. т .3532 .6050 .5183 +.015 

10. re -3580 .6200 .5264 —.008 
11. 7 3542 .6157 .5184 .003 

12. rs .3572 .6107 .5197 
13. Ws = revised estimate of b* — .103  .465  .220 

14. rs (check) -3572 .6167 .5197 .005 

15. т -3588 .0217 5224 —.004 . 
16. ri 3569 .6195 .5184 
17. W, = final estimate of b* 2103 470 .216 


18. 6*,; = value obtained from .102 470 .216 

Doolittle solution pre- 

sented here for com- 

parison 
EE 
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14 Analysis of Variance with 
Two or More Variables 


of Classification 


Some of the simpler applications of analysis of vari- 
ance were considered in Chapter 9. This type of analysis may 
be applied to a great variety of complex experiments, some of 
which provide a method of studying the effects of two or more 
factors in the same experiment. As was the case in Chapters 7 
and 9 it will be necessary to distinguish between situations in 
which only one observation is made on each individual and situa- 
tions in which several observations are made on each individual. 

Two Bases of Classification with n Individuals in Each Cell. 
An experiment on the teaching of reading to backward readers is 
described by Burt and Lewis‘. A group of 48 backward readers 
was given instruction in reading by four methods called A. Alpha- 
betic, B. Kinesthetic, C. Phonic, and D. Visual. In the following 
table, the improvement achieved by each of the pupils is recorded — 
as 100 times the ratio of his final to his original reading score. 
Each ratio is classified both by the method used in remedial 
instruction and by the method previously used in the school which 
the pupil attended regularly. These ratios multiplied by 100 will 
now, for the sake of simplicity, be called scores. Note that the 
Scores appear in 16 groups, three scores in a group. One of the 
groups corresponds to each possible pairing of method of instruc- 
tion used in school and the method used in remedial teaching. 
The sixteen groups are customarily called subclasses. Since the 
subclasses are arranged in rows and columns it is convenient to 
speak of the groups of subclasses as rows and columns. 

The advantage of this arrangement or experimental design 
is that each remedial method is applied equally often to students 
who have been taught previously by each of the four methods. 
Thus the outcomes of remedial instruction are unaffected by 
peculiarities of prior instruction. It is important that the 12 pupils 
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in each column be assigned randomly to the four remedial methods 
to assure that variation in individual abilities does not bias the 
results of the experiment. 

Before writing the necessary formulas it will be helpful to 
review the notation previously used in similar situations. A score 
in the subclass in the ith row and jth column will be denoted by 
the symbol X „a во that the three scores in the first row and second 
column are Xia = 114.2, Xm = 101.6 and Xi. = 113.0. The mean 
of the scores in a subclass will be denoted by the symbol Xy, the 
mean of a row by X,, the mean of a column by X ; and the mean 
of all scores by X. The corresponding population means are pij 
His My and и. 

‘The usual assumptions of normality and equality of standard 
deviation are made, All possible observations which may appear 
in a subclass are assumed to be independently and normally 
distributed about the mean, різ, of the subclass, and each of these 
populations of observations is assumed to have the same standard 
deviation ø. Thus each group of three observations in a subclass 
of Table 14.1 is a sample from a normal population with the mean 
and standard deviation indicated above. 

By use of the parameters three hypotheses can be formulated: 

1. The hypothesis that the row means are equal. This hypothesis 
may be denoted by the symbol H,. For the data in Table 14.1 
the hypothesis is 

H, щш. = ја. = pa, = Ba. Be 
In terms of these data the hypothesis implies that in the population 
the four remedial methods are equally effective. 

2. The hypothesis that the column means are equal. This hypoth- 
esis may be written 

Hei ham ham йз" ham e 
In terms of the data in Table 14.1 this hypothesis implies that 
students improve equally well under remedial instruction regard- 
less of the method of teaching originally used in their schools. 

3. The hypothesis that the interaction is zero. In terms of thi 
parameters this hypothesis may be written 


H, : puy = (ki, = H) + (у H) F 
or by — m- ы + Р” 0. 
The interpretation of this hypothesis is that each subclass m 
(uy) can be found from the general mean by adding to the la 
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TABLE 14.1 Improvement Scores of Pupils Given Remedial Instruction in Reading 
Classified According to Method of Original Instruction and Method of Remedial 
Instruction* 


Method of Improvement scores of pupils receiving 
remedial original instruction by a given method 
instruction A B С D Sum Mean 
A. Alphabetic 98.7 114.2 102.8 103.2 
109.4 101.6 96.1 111.5 
100.2 113.0 110.2 106.4 
Sum 308.3 328.8 309.1 821.1 1267.3 105.61 
Меап 102.77 109.6 103.03 107.03 


B. Kinesthetic 118.6 102.5 113.5 103.9 
106.0 106.7 117.4 119.1 
116.1 110.4 116.8 116.2 


Sum 340.7 319.6 3477 . 38392 13472 11227 
Mean 113.57 106.53 1159 113.07 
C. Phonic ^ 1075 106.4 111.2 101.6 

112.6 98.4 100.8 105.5 

105.3 93.6 109.6 94.7 
Sum 3254 298.4 321.6 3018 12472 103.93 
Mean 108.47 99.47 107.2 100.6 
D. Visual 128.1 113.4 119.8 101.2 

119.0 119.2 106.6 108.0 

126.5 111.3 107.9 107.6 
Sum 373.6 343.9 3343 3168 18686 114.05 
Mean 124.53 114.63 11143 105.6 
Grand Total 1348.0 12907 13127 12789 52303 
Grand Mean 112.33 10756 109.39 106.58 108.96 
CEU UR нки ур TE Ser A SE пне. ac 


* From Burt and Lewis‘, 


the amount by which the row mean deviates from the general 
mean (ш. — u) and the amount by which the column mean deviates 
from the general mean (и.; — п), row and column being those in 
which the subclass is located. In terms of the data in Table 14.1 
the hypothesis means that the variation from subgroup to sub- 
group may be explained by the addition of components due to the 
method of school instruction and method of remedial instruction; 
that the effectiveness of a method of remedial instruction is the 
same regardless of what method preceded it. 

` If the hypothesis H, is true, then we say that there is no 
interaction between the row and column components or that each 
exercises an influence apart from the other. If this hypothesis 
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is not true we say that the interaction of the components produces 
effects which cannot be explained merely by adding the rew and 
column components. The reader will notice that the hypothesis. 
H,. was also formulated in Chapter 9. In that chapter there was 
only one case in each subclass and it was, therefore, impossible 
to test for interaction. In this chapter we shall be concerned with 
situations where there are several cases in each subclass and a test 
for Н will be made. 

To test the three hypotheses just formulated we shall subdivide 
into components the total sum of squares of deviations of the 
individual scores from the mean of all the scores. In Table 14.2 


TABLE 14.2 Symbolic Description of Analysis of Variance in a Two-way 
Layout With Equal Numbers of Cases in the Subclasses 


Source of Degrees of Computational formula 


Sum of squares 


variation freedom for sum of squares 
Total nre—1 » 5 YQ Ху yy у Xe = г. 
Rows 7-1 „у (R= FY Lyn. -2 
Columns ^ c—1 „у (Fı - E уел, z 
Interaction (r—1)(e—1) л у (Ку-Х„-Ху+Ху 225 Tu- E i. 
-l x I 


this subdivision will be shown, together with the degrees of 
freedom appropriate to each component and a further subdivi- 
sion of the component sums of squares to simplify computations. 
We shall suppose in the table that there are r rows, c columns and 
^ cases in each subclass. Hence the total number of cases is 
nrc = N. 
It will be convenient to use the notation: T'i; = the sum of the 
scores in the subsample which is located in the ith row and jth 
column. 7; is the sum of all the scores in the ith row and T.; 
the sum in the jth column. T is the sum of all the scores. —— 
The computational formulas in Table 14.2 will now be applied 
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to the data in Table 14.1. The calculations will be set out in a 
number of steps. 


(14.1) Correction term: 


T? _ (52303) , 
wc 00008) " 5899175 


(14.2) Total sum of squares: 
УУУ ت‎ I = (98.7)? + (109.4)? + °. - + (107.6)? — 569,917.5 
ст д” 

| = 2883.2 


(14.3) Sum of squares for rows (remedial methods) 


1 1 

nd) d ge Soo 
= 35[(1207.3)* + (1347.2)? + (1247.2)? + (1368.6)"] — 569,917.5 
= 570,797.6 — 569,917.5 = 880.1 


(14.4) Sum of squares for columns (methods used in school) 
iy Te ls Т? 
nr ^ nrc 


= Tl (1348.0)? + (1290.7)? + (1312.7)? + (1278.9)*] — 569,917.5 
= 570,148.1 – 569,917.5 = 230.6 ` 


(14.5) Sum of squares for interaction: 


= 571,793.1 — 570,797.6 — 570,148.1 + 569,917.5 = 764.9 
(14.6) Sum of squares for variation within subclasses 


1 
YY YX je — УУ T’ = (98.7)? + (109.4)? +... + (107.6) 
э Ж — 3[(308.3)° + (328.8)? + - - - + (316.8)] 
= 572,800.7 — 571.793.1 = 1007.6 


As in Chapter 9, the F distribution will be used to test each of the 
three hypotheses. First, each of the three sums of squares must 
be divided by the appropriate number of degrees of freedom to 
produce a mean square. These mean squares are listed in Table 
14.3 where the results of the preceding computations have been 
summarized. The values shown in the column headed F in that 
table were obtained by dividing each-of the other. mean squares 
by the subclass mean square 31.5. Values of Fo, and Fs read 
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from the probability tables with the appropriate degrees of 
freedom have been entered in the columns at the right of Table 
14.3. 


TABLE 14.3 Analysis of Variance of Improvement Scores Shown in Table 14.1 


ا 
Source of Sum of Degrees of Mean Р Р Р‏ 
variation squares freedom square - m‏ 

Methods of 
remedial 
instruction 880.1 3 293.4 9.31 2.90 446 

Methods of 
original 
teaching 230.6 3 16.9 2.44 2.90 4.46 

Interaction 764.9 9 85.0 2.70 2.19 3.01 

Individuals 
within ` 
subclasses 1007.6 32 31.5 


To test the hypothesis that remedial methods give equal 
results F = 9.31 is compared with Р.о = 4.46. It is evident that 
the null hypothesis can be rejected at the .01 level. (In fact, it 
can even be rejected at the .001 level, for Fs is only 6.94, but 
that value is not recorded in the F tables provided in this text.) 
It must therefore be assumed that remedial methods do differ 
_ in their effectiveness. 

To test the second hypothesis F = 2.44 may be compared with 
F » = 2.90. It must be concluded that there is no justification 
for asserting that the methods originally used in teaching reading 
differentiate the groups after they have received remedial instruc- 
tion. 

To test the third hypothesis, F = 2.70 may be compared with 
F 95 = 2.19 and F s = 3.01. At the .05 level of significance the null 
hypothesis would be rejected. A very cautious person, preferring 
to work at the .01 level might retain the null hypothesis. That 
hypothesis is that the final results after remedial work can be 
explained by the additive effects of the methods originally used 
in school and the methods used in remedial teaching. As most 
people will probably use the .05 level of significance, they will 
conclude that some combinations of the two methods produce 
better results than others. 

Having discovered a significant interaction, the research 
worker naturally wants to discover its source. If he is an experi- 
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enced research worker he will have foreseen this possibility and 
will have tried to think ahead of time of meaningful comparisons 
which might be made among groups of subclasses. The person 
planning this experiment would know something about learning 
theory, would probably therefore decide that if interaction should 
prove significant he will make a comparison of the gain of the 
12 pupils for whom the remedial method was the same as the 
original method and the gain of the 36 pupils for whom the method 
was changed. Even if he did not think of this idea before he saw 
the data, he might notice — while poring over Table 14.1 — that 
in each column the mean in the diagonal cell, that is the mean 
under repetition of method, is smaller than the grand mean for the 
column. If the explanation concerning the source of the inter- 
action occurred to him before he saw the data, he could test the 
null hypothesis in standard manner. If the explanation was 
suggested by inspection of the data he would probably present it as 
intelligent speculation not verified by a statistical test. The good 
research worker habitually speculates about matters which go 
beyond his data, but he is at great pains to make clear when he is 
speculating and when he is reporting. 

4. The hypothesis that the mean of the 4 diagonal cells is equal 
lo the mean of the 12 non-diagonal cells. The data in Table 14.1 
show the following: 


Mean of 12 scores in diagonal cells = 105.5 
Mean of 36 scores in non-diagonal cells = 110.1 
These means suggest that greater gain is achieved when the 
remedial method differs from the original method of instruction 
than when the two methods are the same. 
To test the significance of this difference between the means 


we can apply "Student's ratio” as in Chapter 7. The standard 
error of the difference is 


tz ds 


As an estimate of с we use the error mean square from Table 14.3, 
so that 


= V31.5 
i Then “Student’s ratio” is 


110.1 — 105.5 — 


Mm ae tid 
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The chance is less than .02 that a difference as great as the one 
indicated shall occur by chance as a result of random sampling 
from populations in which the means are equal. The difference 
may, therefore, be regarded as significant from the point of view 
of random sampling. 

Subdivision of Sum of Squares. The comparison just com- 
pleted can be made in another way which leads to the conclusion 
reached above and which provides additional information. 

Call T, the sum of the scores in the diagonal cells and T. the 
sum in the remaining cells. "The comparison can be made by 
using 7, and T; directly if the difference in the number of cases 
is taken into account. Since 7, is based on 12 cases and Т» on 
36, the comparison of totals can be made if we equalize the number 
of cases by writing 3T; — Ts 

We can now form the sum of squares due to this comparison 
by the expression 

(3T, - T)? 
12(3)? + 36(-1)* 
Since T, = 1266.3 and Т» = 3964.0 this sum of squares is 189.3. 

If this sum of squares is subtracted from the total sum of 
squares for interaction the difference provides an independent 
component of sum of squares with value 764.9 — 189.3 = 575.6. 
Each of these sums of squares can be tested for significance by 
using as error variance the mean square for variation among 
individuals within subclasses which appears in Table 14.3. The 
analysis of variance appears in Table 14.4. 


TABLE 14.4 Subdivision of Interaction in Experiment on Reading 


Source of Sum of f. Mean F Ез 
variation squares square 
Between diagonal cells E 
and all others 189.3 1 189.3 $91 Eum 
Other sources of varia- 
tion among cells 575.6 8 72.0 s 20 
Individuals within cells 1007.6 32 315 


Both F ratios are significant at the 5% level. It appears that 
there are significant elements of interaction other than the differ- 
ence between diagonal and non-diagonal sums. Ч : 

It is interesting to compare the result of this section with that 
of the previous section. In the previous section a ¢ ratio of 2.45 
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was found, but (2.45)? = 6.00 and this differs from the F ratio 
of 6.01 only because of rounding errors. It is evident that the 
computing procedures of the two sections lead to the same con- 
clusion, but the procedure of this section provides a means of | 
subdividing а sum of squares. Such subdivision will be discussed 
more fully in the next section. ў 

Orthogonal Comparisons. We have just seen that а combina- 
tion of totals can be made up which reflects some interesting aspect 
of a statistical problem. The combination which was considered 
provided a comparison of elements of the problem, and was 
distributed with one degree of freedom. Theoretically every 
degree of freedom in a statistical problem can be used to provide 
information of some statistical interest. From this point of view 
each of the nine degrees of freedom for interaction in the reading 
problem should provide information of interest. It is usually 
difficult, however, to determine an element of interest that can 
be drawn from each of the degrees of freedom in a statistical 
problem. 

In the following paragraphs we shall develop procedures for 
subdivision of a sum of squares into components each of which 
has one degree of freedom. The general formulas will be applied 
to a subdivision of the sum of squares of remedial reading methods. 

Some definitions will be needed. Suppose that there are k 
totals Tı, T», ..., T, based on Ni, Nz,..., №, cases respectively. 
Then a combination 

LT, BT, +۰۰۰ + lT 
is called a comparison if 
(14.7) АА + Nala +۰۰۰ + Nl, = 0 
As an illustration recall that in the previous section we had 
k=2,h=3,h=-1, № = 12, М» = 36, 


so that Nil, + Nols = 0 
The sum of squares due to this comparison is 
(14.8) (AT; + LT; +۰۰۰ + lT)? 


Nil? + Nol? +--+ + Nil? 
The sums of squares due to a set of comparisons will be inde- 
pendent if the comparisons are orthogonal. Two comparisons 
* among the same totals 
ut, + LT, án ipsios T UT, 
and mT, + my T; + - +++ mT 
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are orthogonal if 
(14.9) Мт + Nim; +--+ Num, = 0 
If the number of cases is the same for all totals, that is 
NB E 
then a combination is a comparison if 
h+h+-:-+k=0 
and two comparisons are orthogonal if 
lm; + lama + ۰۰° + l,m, = 0 
If as many comparisons are formed as there are degrees of 
freedom then the sums of squares of a set of orthogonal compari- 
sons constitute a complete subdivision of the total sum of squares. 
It should be noted that orthogonal sets of comparisons can be 
made up in an endless number of ways. 
Let us apply the formulas to obtaining a set of orthogonal 
comparisons among the methods of remedial reading. From 
Table 14.1 we have 


Method of Instruction Total 
Alphabetic Т, = 1267.3 
Kinesthetic Т, = 1347.2 
Phonic Т, = 1247.2 
Visual T, = 1368.6 


The following comparisons suggest themselves as being of 


interest: | 
The alphabetic and phonic methods jointly in contrast with 


kinesthetic and visual jointly. This comparison has the form 
Ti+ Tı- Tr-— Т, 
The alphabetic in contrast with the phonic method, with the 


form T, – T, . 
The kinesthetic in contrast with the visual T; — Т, 
The multipliers of Т, Тз, Ts, Ta which describe these compari- 


sons are then as follows: 


Т, Ti T; T, 
1 41 1 -1 
1 0 -1 0 
0 1 0 =l 


For any row the sum of the multipliers is zero. For any two rows 
the.sum of products of corresponding multipliers is zero. Hence 
the conditions for orthogonality are satisfied. 
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The sums of squares for the three comparisons are 
(Ti+ Ts = Ta — Ty 


= 844.2 
12412412412 ~ 4420 
Gi 2s)? ag 
pup occae 
(Т. — T,)? _ 
EMS 0S 


Total 880.11 


The reader should compare 880.11 which is the total of the 
three sums of squares with the entry 880.1 which is the sum of 
squares for methods of remedial instruction in Table 14.3. The 
two are the same except for rounding errors. 

'The sum of squares for each of the three comparisons has one 
degree of freedom and can be tested for significance using the 
sum of squares of individuals within subclasses from Table 14.3 as 
the error sum of squares. The first comparison is highly sig- 
83520 2268 and Fa = 7.50. Neither of the 
other comparisons is significant. 

The Estimation of Error. The denominator used in setting up 
the F ratio is called the “error mean square," the “error variance," 
or briefly the “error.” The problem of choosing the appropriate 
sum of squares for error requires careful consideration. The 
choice differs from situation to situation. 

Mathematically the numerator mean square and denominator 
mean square in F must be estimates of the same population 
variance in order that the F distribution shall apply. Let us see 
how this consideration applies to the problem of remedial reading. 
In this problem the entire population of children sampled in the 
experiment may be considered as stratified into 16 cells or sub- 
elasses. Under each of the three null hypotheses considered above 
the means which appear in the numerator mean square are affected 
only by variations among individuals within the subclasses. Con- 
sequently if the null hypothesis is true each numerator mean 
square is an estimate of the same population variance as is esti- 
mated by the variation within subclasses. 

The situation was different in the problem of Chapter 9 when 
the performance of individuals at archery when shooting at the 
50-yard range in first, second and third order was considered. 
There each individual was measured on each of the three orders. 


nificant, for F = 
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For eleven individuals there were 33 cells or subclasses with one 
score ineach. If that one score were really the mean of all possible 
scores made by the given individual shooting at the 50-yard range 
in the given order, then the residual variance of Table 9.9 would 
be an interaction variance. Here the interaction would be between 
individual and order and would be due to the differential response 
of individuals to such factors as warming up, practice, and fatigue. 

However the one score shown in a cell is not the mean but is 
merely one score out of an infinite population of scores correspond- 
ing to that cell. As representative of the cell it involves an error 
of measurement, and thus variation within cells is also present 
in the residual variance of Table 9.9. 

When there is only one score in a cell, the only error variance 
available is the residual variance which includes both the variance 
among subclass means and the variance among individuals within 
subclasses. These are confounded and cannot be disentangled. 
There is no possibility of testing interaction against variance 
within cells. 

Now let us consider the numerator mean square, that is the 
mean square among means of orders. These means are subject 
(о sampling variation of two kinds. If new individuals are ob- 
served, the total level of performance of those individuals (their 
row means) will not affect the comparison among the order means 
but their differential response to order may. 

As new individuals are observed, new subclasses arise and if 
there is interaction among the subclass means such interaction 
is properly a part of the error variance for testing order difference. 
The variation among the order means is also affected by the var- 
iation within subelasses, for each score involves a measurement 
error. Thus the numerator mean square among order means (that 
is column means in Table 9.9) is an estimate of the same popula- 
tion variance as is the residual variance which represents the 
discrepance of the subclass observations from the estimate pro- 
vided by row and column means, this residual variance being a 
value which includes both interaction and variation within sub- 
Classes, When there is only one case in each subclass the dis- 
crepance is the only measure of error available and no problem 
of choice arises. 3 

To illustrate a situation in which a choice of error variance 
must be made we may consider a problem structurally similar to 
that of the 11 archers except that several scores are available in 
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each subclass. Four methods of instruction are carried on in 
each of five schools, with several persons in each school taught by 
the same method. Here there are four times five or twenty sub- 
classes. However, these subclasses are not a stratification of the 
population since the schools are themselves a sample out of all 
possible schools just as the 11 archers were a sample out of all 
possible archers. The variation among means of methods is due 
in part to variation within each of the observed twenty subclasses 
but also in part to variation of subclass means from the values 
they would have if they were completely dependent upon their 
row and column means. The latter variation is measured by the 
interaction mean square. 

Hence in a situation like the one just described a test is first 
made for significance of interaction by using the variation within 
subclasses as error. If the interaction mean square is significantly 
greater than the mean square within subclasses that interaction 
mean square is used as the error variance for testing significance 
of row and column effects. If the interaction mean square is not 
significantly greater than the variance within subclasses then this 
interaction may be considered an estimate of the same population 
variance as the mean square within subclasses. In such case the 
sum of squares for interaction and for variation within subclasses 
can be advantageously combined to form a new error variance for 
testing row and column effects. These procedures will be dis- 
cussed. 

Mathematical Models for the Two-way Layout. The discussion 
of the previous section will now be amplified by a symbolic analysis 
based on mathematical models used in analysis of variance. 
The mathematical model which is used to describe the population 
plays an important part in determining how the analysis of 
variance is to be carried out. 

The effect of the mathematical model on the analysis of vari- 
ance is due to the logie of the F distribution. It has been stated 
previously that the mean squares in the numerator of F and in 
its denominator must be estimates of the same variance in order 
that the F test may be made. For accurate use of the F test it is 
necessary to determine in each case whether this condition is 
satisfied. 


Consider first the mathematical model used in the experiment _ 
on remedial reading. In this model the mean of each cell is q 


assumed to be the constant Big. 
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TABLE 14.5 Population Values Estimated by Mean Squares in an Experiment 
with г Rows, с Columns and п Individuals in each Cell 


Source of Population 
variation Меша value estimated 
cn 
Rows a Z(X,- Xy» + DX — ш)? 
Columns 5 m 1 У(Х; – Xy + DES =)! 

; nzX(Y,— X, — Y, Xy nZX(ui; — ni. — из tu)? 
Interaction ar estin + Wr SII Wc 
Individuals 

within sub- 2 
classes IS — Ku) Xay о? 
re(n — 1) 


Population values estimated in Table 14.5 are determined only 
on this assumption. Each of the hypotheses Hr, H. and H, 
modifies these values so that if the hypothesis is true the estimated 
value becomes о? in each case. Consider for example the hypoth- 
esis H, which states that ш. = us. +++ = иь, = ш. If this hypothesis 
is true each difference p;.— is zero. Hence the expression 
ы. — 8) is zero, and the estimated population value for rows 
is c*. It follows that under the appropriate hypothesis each Р 
ratio formed in Table 14.3 satisfies the condition that the numer- 
ator and denominator are estimates of the same variance. 

Consider now the model for an experiment in which four 
remedial reading methods are tried in each of several school 
systems, the school systems being chosen at random from a popu- 
lation of school systems. Suppose that the teaching methods are 
set up in rows and the school systems in columns. The mean 
of a row can be regarded as the same for all individuals in the 
row but the mean of a column is a variable which depends on the 
particular school system chosen for that column. The mean of 
а cell must take into account these components and their inter- 
action. 
'The discussion will be clarified if the following symbols are 
introduced. А 

ш. is the mean of a method of instruction. This is the same 
(constant) for all samples. The mean of the m. is. — 

a; is a component due to the school system. This will vary 
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as different school systems are introduced. We shall consider 
it a normally distributed variable with mean zero and variance cai 
b;;is a component due to the interaction of method of instruc- 
tion and school system. As different school systems are chosen; - 
bi; will vary. We shall consider 6;; as distributed normally and 
independently of a; with mean zero and variance g+. і 
With this symbolism the mean of a cell in the ith row and 
jth column is 
ш. +05 + bi. 
In Table 14.6 are shown sources of variation and related estimates 
of population values for the statistics in Table 14.2 under the 
model just described. 


TABLE 14.6 Population Values Estimated in a Two-way 
Layout When the Column Means are Variables 


Source of Population value estimated 
variation by mean square 
Rows а? + псу + ; 7 (ki. — p)? 
Columns o" + not + nro 
Interaction o? + no? 
Individuals 
within subclasses а? 


———M—— 


Table 14.6 indicates that the F ratio cannot always have the 
mean square within subclasses as a denominator. Under the 
hypothesis that u; = u for all row means, the estimated mean 
square for rows is 0° + no, while the mean square within sub- 
classes is o?, 

The tests of significance are carried out as follows. First, 
test for interaction using the mean square for interaction as 
numerator and the mean square within subclasses as denominator. 
Under the hypothesis that interaction is zero (c,?- 0) both 
mean squares are estimates of о?. 

The tests for rows and columns depend on the outcome of the 
test for interaction. If the hypothesis that о? = 0 is rejected, 
then under the hypothesis that и: = p, the estimated mean square 
for rows is the same as for interaction. For columns, the hypoth- 
esis that there is no column effect is given by оа? = 0. Under this 
hypothesis, too, the F test is made by using the mean square for 
interaction in the denominator. 
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If the hypothesis that interaction is zero is accepted then either 
the mean square for interaction or the mean square within sub- 
classes may be used. However, it is preferable to combine the 
sum of squares for interaction with the sum of squares within 
subclasses, and similarly to combine the degrees of freedom. Then 
from this combination a mean square is found to use in the denomi- 
nator as estimate of the error variance. In this way more degrees 
of freedom are available in the denominator. 

The experiment described under the second model is often 
referred to as a randomized block experiment, in analogy with 
agricultural experiments in which the same experiment is repeated 
on several blocks of land. Here the block is the school system. 
After the groups have been selected within each block, the methods 
of instruction are assigned randomly to the groups. 

Experimental Procedure Replicated on Each of Several Indi- 
viduals. In some experiments it is possible to carry out the entire 
experimental procedure on each individual in the study. A great 
advantage of carrying out an experiment in this way, when 
conditions permit it, is that differences between persons are 
eliminated from the comparisons among elements of the experi- 
ment. Another advantage is that fewer cases are needed, because 
several measures are obtained from each case. When several 
measures are obtained from the same individual, these measures 
иге correlated and the calculations must take the correlations 
into account. Calculation appropriate for this kind of problem 
will be considered. 

As an example of an experiment replicated on each of several 
individuals, we shall consider once more the experiment on 
archery discussed in Chapter 9. Each subject in that study shot 
daily for six days at ranges of 50, 40 and 30 yards. The order of 
shooting at these ranges varied from day to day so that each 
subject shot each range twice in the first position, twice in second, 
and twiee in third. In Chapter 9 only part of these data were 
utilized. In Table 14.7 are given all the scores for the entire 
period for each of 11 subjects classified by range and by order of 
shooting that range. Each entry in the table is the sum of two 
scores. Thus, if on one day the ranges were shot 50-30-40 and 
on another day were shot 40-30-50, the two scores at range 30 would 
be combined to make the score for range 30 in the second order. 

Note that each score in the Table 14.7 can be classified in 
three ways as belonging to a range, an order and a subject. It 
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TABLE 14.7 Scores of Eleven Archers Shooting at Three Ranges 
and in Three Orders at each Range* 


س 


Score at given range and order 


Archer 30 yards Total 
1 2 3 
A 305 255 286 1911 
B 203 188 225 1369 
© 201 184 238 ` 1259 
р 163 167 219 1231 
Е 267 182 259 1865 
Е 146 120 157 887 
G 195 235 181 1281 
H 162 166 213 1132 
I 275 240 227 1794 
1 170 110 126 1005 
K 232 166 221 1509 


Total 977 1239 1110 | 1706 1839 1688 | 2319 2013 2352 15,198 


* Data from Schroeder !*. 


+ The number 1 indicates that the given range was the first of the three ranges - 


to be shot, and the numbers 2 and 3 аге to һауе similar interpretations. 


is convenient to think of each score as belonging to a cell in & 
rectangular solid, one score to a cell. The solid may be pictured 
asin Figure 14-1. Each of the 99 scores is represented as located 


Fic. 14-1. Three dimensional solid representing scores 
made by 11 archers shooting in 3 orders at 3 ranges. 


in one of the 99 cells of which thissolid is composed. The 9 cells 


containing the scores of the sixth archer have been indicated by - 


a cross section in which a shaded portion represents the cell con- 


taining the score made by archer 6 shooting at 50-yard range in 
first order. 


| 


Replication on Each Individual · 365 


In the same manner in which marginal totals are obtained in 
a flat or two-way layout, marginal totals are obtained for this 
three-way layout. The marginal totals are represented as two- 
way layouts, one set of marginal totals for each pair of variables. 
The sets of marginal totals will be presented in Tables 14.8, 14.9, 


and 14.10. 


TABLE 14.8 Sum of Scores for All 11 Archers 
Classified by Order and Range 


Sum of scores at given range 


Order Total 
50 yards 40 yards 30 yards 

1 977 1706 2319 5002 

2 1239 1839 2013 5091 

3 1110 1638 2352 5100 


Total 3326 5183 6684 15,193 

The reader may check that the nine entries giving sums for all 
subjects in Table 14.8 are the 9 entries in the lowest line of Table 
14.7. He may verify the entries in Tables 14.9 and 14.10 by adding 
appropriate entries in Table 14.7. For example the first entry 
in Table 14.9 is 419 and equals 114 4- 182 4- 123 in 14.7. 

From the data in Table 14.7 and its subtables the various 
component sums of squares can be calculated. There are seven 
of these components. Three are the main effects, the effects 


TABLE 14.9 Sum of Scores for All 3 Orders 
Classified by Archer and Range 


А Sum of scores at given range Total 
50 yards 40 yards 30 yards 

A 419 646 846 1911 
B 339 414 616 1369 
с 186 450 623 1259 
р 294 388 549 1231 
Е 460 697 708 1865 
F 158 306 423 887 
G 240 380 611 1231 
н 178 418 541 1132 
1 462 590 742 1794 
J 264 335 406 1005 
к HODIE == 
otal 3326 5183 6684 15198 
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TABLE 1410 Sum of Scores for All 3 Ranges 


Classified by Archer and Order 

Sum of scores at given order Total 

1 2 3 
A 667 678 566 1911 
B 403 469 497 1369 
с 405 433 421 1259 
р 355 453 423 1231 
E 647 521 697 1865 
Р 322 215 290 887 
G 383 427 421 1231 
H 336 419 377 1132 
I 602 021 571 1794 
Ј 359 831 315 1005 
K 523 464 522 1509 
Total 5002 5001 5100 15,193 


of range, and subject. Three more are the inte 
between pairs of factors, called first order interactions, 
remaining one is the joint interaction between all three vari 
called the second order interaction. | 

As usual the calculations begin with the correction term whic 


is (5.103) = 2,331,588 


Then the sums of squares for main effects are as follows: 
Archers: 
MC (1011)? + (1369)? + +--+ (1005)? + (1509)7] — 2,331,588 = 134, 
Range: — 35[(3320) + (5183)? + (6684)"] — 2,331,588 = 171 
Order: (5002)? + (5091)* + (5100)*] — 2,331,588 = 178 


Note that in these calculations the number by which each 
of squares is divided is the number of cases which enter into 


$ 


Range X Order interaction from Table 14.8: 
100977)? + (1706)* +... + (1638)? + (2352)*] 
= їз [(3326)* + (5183)* + (6684)7] 
= yy [(5002)* + (5091)* + (5100)7] + (15,193)?/99 
= 2,514,463 — 2,503,079 — 2,331,766 + 2,331,588 = 11,1 
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Range x Archer interaction from Table 14.9: 
AC(419)* + (646)? ++» «+ (559)* + (619)*) 
— gy [(3326)* + (5183)! + (6684)*) 
— kL (1911)? + (1869)? + » + + + (L005)? + (1509)*] + (15,193)*/09 
= 2,656,966 — 2,503,079 ~ 2,465,983 + 2,331,588 = 19,492 
Order х Archer interaction from Table 14.10: 
A{ (667)? + (678)? + ++ + + (464)? + (522)*) 
— pal (5002) + (5091)* + (5100)*) 
— 0001911)? + (1869)? + + - - + (1509)*] + (15,193)*/99 
= 2,480,800 — 2,331,766 — 2,465,983 + 2,331,588 = 14,030 
Order х range х Archer interaction: 

Perhaps the simplest way to calculate the second order inter- 
action is to calculate it as a residual, subtracting from the total 
sum of squares the component sums of squares which have just 
been computed. The total sum of squares is obtained from 
Table 14.7 as the sum of the squares of the 99 scores minus the 


correction term, 
3 
(114) + (182)? + `. + (6бу' + (221) - 15193 
= 2,703,221 ~ 2,331,588 = 371,033 


Then the second order interaction sum of squares is 
$71,633 — 134,395 —171,491'~ 178 — 11,196 — 19,492 — 14,039 = 20,242 


ond Test for First Order Interaction 
<<< |e 
Bowen "MELIA I ри Ра 
variation squares — "quam 
Range 2 171,401 55740 
Order 2 178 89 
Archers 10 1 
Order X range 4 пф 270 5з 26 зл 
Onder x archer 20 14,639 7-2 145 м 227 
Range X archer 20 19,402 $6 10 1м 23 


368 - Analysis of Variance 


The number of degrees of freedom for each main effect in the 
analysis of variance table is one less than the number of classes for 
that trait. For a first order interaction the number of degrees 
of freedom is the product of the degrees of freedom for the two 
main effects which enter into the interaction. For the second 
order interaction the number of degrees of freedom is the product 
of the degrees of freedom for the three main effects in the inter- 
action. The sum of all degrees of freedom is one less than the 
number of cases in the study. 

Sources of Variation. Before proceeding with tests of signif- 
icance a discussion of the sources of. variation which enter into 
the analysis of variance is desirable. The discussion will deal with 
the variables themselves, or the main effects, and with the inter- 
actions among the variables. 

1. The range. This will be considered as having the same 
fixed values, 30, 40 and 50 yards, for all samples. 


2. The order of shooting. This will be considered as having - 


the same fixed values 1, 2 and 3, for all samples. 

3. The skill of the individual. The individuals will be con- 
sidered as a sample of all possible similar individuals. Skill, as 
represented by mean score of an individual in repeated trials, 
will vary from individual to individual, and, thus, mean skill 
will vary from sample to sample. To consider the mean skills 
of individuals as fixed from sample to sample would restrict the 
generalization to the individuals in the study and would make the 
study of little interest. Consequently the more general view will 
be adopted here. 

4. Interaction of range with order. Since both range and order 
are fixed for all samples their interaction is also fixed. 

5. Interaction of range with skill of individuals. Since the 
individuals vary from sample to sample, this interaction is a 
variable. 

6. Interaction of order with skill of individuals. For the same 
reason this interaction is also a variable. 

To show how this view of the sources of variation influences 
the analysis of variance we shall introduce symbolism with which 
to describe the mathematical model. 

The following subscripts will be used: 7 indicating any one of 
the r ranges; j indieating any one of the s orders; a indicating 
any one of the ¢ individuals. 

The symbol ду will be used for the population mean at the 


E. on 
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ith range and jth order. The Greek letter u indicates that this 
mean is the same for all samples and thus is really a character of 
the population from which the samples are drawn. 

To describe the score X ija made by the ath individual shooting 
at range û in order 7, symbols will be needed for three additional 
variables, and for these the Latin letters, a, b, and c, with appropri- 
ate subscripts will be used. 


Symbol Component Expectation Variance 
da Extent to which the ath individual 0 [^ 
differs from the population mean д 
da = Ша — H 
bia Extent to which the ath individual 0 oF 


shooting at range ї differs from what 
is expected in terms of his own mean 
and the range mean 
bi = ша — Mi — Ha + ИШ 
Cia Extent to which the ath individual 0 сд 
shooting in order j differs from what 


is expected in terms of his own ability 
and the order mean 


Cj = Ша — Mj = Ha + И 


These three components are assumed to vary from individual 
to individual, independently of each other and to be normally 
distributed. Then the score made by the ath individual shooting 
at range 7 in order j will differ from 

Mija = Mij + Oe + bia + Cja 

only through an error of measurement. All the possible scores 
which can be made by a particular individual shooting at fixed 
range and in fixed order may be considered as a normal population 
with mean Hija and variance 0°. Each cell of the table has such 
a normal population, all populations’ having the same variance 
but quite possibly differing as to mean. In the data under con- 
sideration, however, we have only one observation from any one 
such population. Under the assumption that the second order 
interaction is zero in the population, the second order interaction 
of the sample provides an estimate of o°. | 

Оп the basis of this mathematical model the mean squares in 
Table 14.11 are estimates of the population values named in 
Table 14.12. 
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TABLE 14.12 Population Values Estimated by the Mean Squares 
of Table 14.11 


Population values 
estimated 


Source of 
variation 


Range oF + so? + LÀ SQ. — uy 


Order e re? + LL (sa uy 


Archers а? + soy? + ro + тво? 

Range X Order e+ еу ZX(ui – щш. — p.i + uy 
Order X Archer о? + то 

Range X Archer а? + soy? 


Order X Range X Archer о? 


Tests of Significance for Interactions. By study of Table 14.1 
the appropriate tests of significance can be ascertained. Consid 
first the tests of significance for interaction. 

1. Interaction of range with order. The hypothesis that this 
intetaction is zero is given by the expression 

шу — ш. — иу +и = 0 
for each range and each order. Under this hypothesis the ratio 
of mean square for range x order interaction and the mean 
square for the order x range x archer interaction are independent 
estimates of c*. Consequently the F ratio using these mean 
Squares is an appropriate test for the hypothesis that no inter- 
action exists between order and range. | 

The entry in Table 14.11 shows that this F has the value 5.53, 
so that the effect of order appears to depend on range. 

2. Interaction of archer with range. The hypothesis that this 
interaction is zero is given by the expression 

съ? = 0 
Hence the mean squares for interaction of range X archer. and the 
second order interaction are both estimates of c? and the F test 
applies. The value of F to test this interaction, as found in 
Table 14.11, is 1.93 which is significant at the 5% level and the 
hypothesis of no interaction may be rejected. : 

3. Interaction of archer with order. By an argument like the 
preceding the appropriate F test for this hypothesis is the ratio of 
mean square for order x archer to the mean square for the second 
order interaction. The F value is not significant at the 5% level. 
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Choice of Error Variance for the Main Effects. This choice 
must be made in the light of a preliminary examination of the 
first order interactions, such as is shown in Table 14.11. When 
the null hypothesis can be accepted for all of the first order 
interactions, then each of the four mean squares for interaction 
is an estimate of o*. The appropriate procedure then is to pool 
all the four sums of squares for interaction forming a new sum of 
squares for error, to pool the corresponding degrees of freedom, 
and to obtain a new estimate of c* by dividing this sum of squares 
for error by the sum of the degrees of freedom for the various 
interactions. This estimate of the error variance forms the 
denominator in the F ratio used to test each of the main effects. 

In the archery problem, however, the preliminary analysis 
indicates that one first order interaction may be treated as zero 
and the other two may not be. : 16 will therefore be necessary to 
consider separately the estimate of error for each of the main 
effects. У 

Since the purpose of the experiment, was to determine the 
effect of order only, the tests of range and of archers are of no 
practical import and are introduced here only to illustrate the 
appropriate procedures. 

1. Test that range means are equal. Applying this hypothesis 
to the quantity named in Table 14.12 in the line headed range, we 


have E Z(u;. — u)? = 0, so the mean square for range under the 


null hypothesis for range is an estimate of a° +502 if archer X range 
interaction exists and is о? if archer X range interaction is zero 
(rj = 0). In Table 14.11 this interaction has been tested and 
found significant. Therefore the mean square for range and the 
range х archer interaction are both estimates of 0° + sø, and 
the latter is the appropriate denominator to use in the F ratio. 
Table 14.13 presents the final analysis for range. 

2. Test that order means are equal. Applying this hypothesis to 
the quantity named in Table 14.12 in the line headed order, we 


have a E(u; — u)!-0, so the mean square for order is an 


estimate of o? + rg? under the null hypothesis for order. Since 
the preliminary analysis of Table 14.11 shows the archer x order 
interaction non-significant, we ‘pool its sum of squares with that 
of the second order interaction and complete the analysis as in 
Table 14.14. А 
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TABLE 14.13 Analysis of Variance for Testing the Significance 


бош орык xc BERE эе Мека F ға 7 
variation squares square c 
Range 2 171,491 85,746 87.944 3.49 585 — 
Archers 10 134,395 13,440 13.785 2.35 3.37 _ 
Error 20 19,492 975 Ы 


of Range and of Archer іп the Data from Table 14.11 


TABLE 14.14 Analysis of Variance for Testing the Significance 
of Order in the Data from Table 14.11 


سے 
Source of n Sum of Mean Р Pos Fe %‏ 
variation squares square А‏ 
Order 2 178 89 153 3.15 4.98‏ 
Error 60 34,881 581 ж‏ 


—.———————— 


3. Test that archer means are equal. This test presents a more _ 
complex problem since the quantity named on the line headed 
archer in Table 14.12 contains two first order interactions. For 
our data the preliminary analysis has indicated that we may treat 
c as zero but may not so treat съ. Under the null hypothesis 
for archers, o.2=0. Hence under that hypothesis the mean - 
square for archers is an estimate of 0° + sc. As this same - 
quantity is also estimated by the archer x range interaction, the 
latter is the appropriate estimate of error, and the analysis proceeds 
as in Table 14.13. | 

If both of the first order interactions with archers had been 
found significant, we could not have treated either ть? or ог? as 
zero. Then no single interaction would provide a proper test for 
the difference among archers, which is the test that ог = 0. 
The mean squares must now be combined to form a proper estimate, 
of error, and we shall use the following key: 


Symbol Mean square 
ko Archers 
kı ` Archer x order interaction 
ks Archer x range interaction 
ks Archer х order Х range interaction 
Then 
(14.10) F ko 


Chth-kh 
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with (t — 1) degrees of freedom for the numerator and 
(kı + ka — ks)? 


(14.11) n = ELA г 
G-DG-0D*0-00-0*(0-26-06-0 


degrees of freedom for the denominator (The reader may wish to 
be reminded that / has been used to indicate the number of archers, 
r the number of ranges and s the number of orders.) 

Latin Square and Graeco-Latin Square. In problems which 
involve classification into several categories on each of several 
variables, it is often desirable to take observations for all combina- 
tions of the values of the different variables. By arranging the 
observations in one of the forms known as the Latin Square or the 
Graeco-Latin Square a large amount of information can be 
extracted from a relatively small number of observations. In the 
analysis of the data so obtained it is necessary to make the assump- 
tion that interactions are zero, and the investigator should recog- 
nize that he is making this assumption. 

Suppose, for example, that the direetor of a statistical labo- 
ratory is about to buy new machines and has several models under 
consideration. He wants to study the speed with which a skilled 
computer ean complete a given amount of work on each machine. 
Obviously, if one person should perform the same computations on 
each of several machines there would probably be a practice effect. 
Consequently he must perform different computations, and there 
is a risk that those computations may not be equally difficult. The 
solution therefore requires that there be as many computers as 
machines and as many problems as machines, and that each 
computer use each machine once and solve each problem once, 
and that each problem be computed once on each machine. In 
the following scheme, each row represents а machine, each column 
represents a problem, and the letters A, В, С, Р and E represent 
computers. 


Problem 
1 2 3 4 5 
Machine I АУ УВ D E 
Machine II B C E А 
Machine III Q.D АВ: 
Machine IV D E Ba C 
Machine V E A c D 


Latin Square 
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The 24 degrees of freedom among the 25 observations may be 
assigned as follows: ‘ 
Machines 4 

Problems 4 

Computers 4 

Residual error 12 


Total 24 


Now suppose it appears that the speed of a computer may be 
affected by the order in which he uses the machines. Then the 
experimental design must provide that each machine shall be 
used first by one computer, second by another, third by another, 
fourth by another, and last by the remaining computer. If now 
the small Greek letters, œ (alpha), 8 (beta), y (gamma), à (delta), 
and є (epsilon), be used to represent the order in which a machine 
is used, the design may be called a Graeco-Latin Square. (Tl 
same name, Graeco-Latin, is customarily applied even. if 
second factor is indicated by numerals or by small Latin letters 
instead of Greek letters.) 


The 24 degrees of 
freedom now are as- 
signed as follows: 


I 
Machines 4 п 
Problems 4 ш 
Computers UY 
Orders 4 yi 
Residual error 8 Graeco-Latin Square 
Total 24 


There are 4 variables (machines, operators, problems, orders). 
The data are classified into 5 categories on each of these variable 
In the 25 observations recorded every category of one variable 
occurs once and only once in combination with every category 
of every other variable. Thus each row of the pattern displays 
every category of every variable except the row variable, and each 
column displays every category of every variable except the 
column variable. 

Application of Graeco-Latin Square to Study on Memorizing 
Music. In the standard Latin Square or Graeco-Latin Square, 
there is one observation in each cell. We now shall consider an 
arrangement of that sort obtained by Rubin-Rabson ® from an ex- 


CON 


T$ 
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periment in the memorizing of piano music and subsequently 
shall consider an adaptation of the arrangement in which there 
are several scores in each cell. Each subject came to the studio 
on four successive days. On each day he memorized one composi- 
tion and exactly three weeks later he returned to the studio and 
relearned the same composition, The time in minutes required 
for the first learning of a composition, the time for its relearning 
and the reduction in that time (that is, the amount by which the 
time for the first learning exceeded the time for relearning) are 
the basic data of the study. Each of the four compositions was 
studied by a different method, and the comparison of the four 
methods is the chief goal of the study. The effect upon method 
difference caused by difference in difficulty of compositions and 
in order of presentation of the compositions has been controlled 
by the experimental design. Therefore, the variation in composi- 
tions and in order of presentation must not be allowed to con- 
tribute to the error variance in the statistical analysis. 

The four methods to be compared will now be briefly described. 
The reader who is interested in more exact details should consult 
the original study. 


A: Seated at a table the subject studied the: musical score and its 
analysis for 20 minutes. "Then he carried the score to the piano and 
practiced until he had memorized it. When he could play it through 
twice without errors and without stopping he was admonished not to 
practice it further and to think of it as little as possible. 

B: This was the same as A except while the subject studied the score 
he was told to write down any relationship concerning voice movement, 
chords, etc., which he thought might help him in memorizing the compo- 
sition, and was allowed 25 minutes for the pre-study. у 

С: There was no preliminary study period. Subject went directly 
to the piano but was warned to play the composition for the first time 
at a speed slow enough to avoid errors. 

D. Subject was seated before a phonograph and given the musical 
score. He listened to four successive repetitions of the composition while 


he followed the score. 


In each method, time was clocked from the beginning of the first 
playing of the composition until it was played through twice 
without errors, Three weeks later the subject returned to relearn 
the compositions in the same order as before but without any 


preliminary study. 1 ; 1 
Thirty-six subjects were tested in 4 groups according to the 
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following scheme, where P, Q, E and S represent the 4 composi- 
tions and A, B, C and D the 4 methods: 


First Second Third Fourth 
Group day day day day 
1 AP BQ CR DS 
2 DR cs BP AQ 
3 BS AR DQ CP 
4 CQ DP AS BR 


From the description of the original study it is not clear whether 
the four groups should be considered as a classification variable 
or not." If individuals were not assigned to the groups at random, 
but groups were selected on the basis of some principle of con- _ 
venience or existing class structure, then the variation from group 
to group might be large in relation to variation among individuals _ 
within groups and also large in relation to residual error. In that 
case the variation among group means should be taken out as a 
principal component. On the other hand, if individuals were 
assigned to groups by some purely random procedure the variation 
among groups is due to sampling error and should be a part of the 
error variance. It is true that no two groups were exposed to 
the same combination of method and composition on the same 
days, but that is of no importance if the assumption of zero 
interaction is justified. The interaction cannot be tested in this. 
arrangement but is explicitly assumed to be zero. Inasmueh as 
the printed study does not state how persons were assigned to 
groups, it is not possible to be sure how groups should be treated 
in the analysis. 

As a first analysis we shall ignore the variation among persons - 
and shall consider as basic data the sum, for the nine individuals 
in one group, of the number of minutes by which the time for the _ 
original learning of a composition exceeded the time for relearning 
it. These data are presented in Table 14.15. | 

There are 16 observations yielding 15 degrees of freedom. If 
we consider group as a variable there are 4 variables — method, 
composition, day and group — each of which has 3 degrees of 
freedom. There remain 15 — (3+3 +3 +3) = 3 degrees of 
freedom for error. Each F ratio obtained will thus have 3 degrees 
of freedom for the numerator variance and 3 for the denominator 
variance, and so must exceed F s; = 9.28 to be significant at the 
-05 level. It is evident that this design cannot detect small differ- 
ences among the methods.. 
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TABLE 14.15 Reduction in Learning Time in Minutes for Compositions P, Q, 
R, and $ When Studied by Methods A, B, C and D. (Each entry is the sum 
for 9 subjects. Numerals 1, 2, 3, and 4 indicate group.) 


et 

Method First day Second day Third day Fourth day Total 
A P1 80 R3 28 S4 59 Q2 40 207 
B 83 128 91 70.5 Р2 96 R4 41.5 336 
C Q4 50 52 57 Res P3 62 177 
р R2 505 P4 94 Q3 92 81 33.5 210 


eee‏ ا ا کک لے 
Total 308.5 249.5 255.0 177.0‏ 


The computation of the various sums of squares is shown in 
Table.14.16 and the resultant analysis of variance in Table 14.17. 

Latin-Square Pattern with Several Observations in Each Cell. 
The previous analysis of the time required for memorizing piano 
music under different methods of study did not make use of all 
the data available. The raw data of the study which may be seen 
in Table 14.18, indicate that there were 9 observations in each 


TABLE 14.16 Computation of the Relevant Sums of Squares 
for the Data of Table 14.15 


Methods Compositions Days Groups 
A 207 P 332 1 308.5 I 192 
B 336 Q 2525 2 2495 ID 2435 
C 177 R 1% 3 255 HI 310 
wei 270 и IV 245 
4 
Dy x 990 990 990 990 
1 
4 
х(ў х)! 259974 267370.5 253776.5 252036:5 
1 
4 
+2 (Xx)? 6993.5 66842,6 63444.125 63009.125 
990/16. 61256.25 61256.25 _61256.25_ 61256.25 
Sum of 
squares 3731.25 5586.375 2187.875 1752.875 


4 4 
Total sum of squares: 3 » Nim 
1 


j=li= 


(ZzXay 
16 


990 
= (90? + 128° + +--+ 33.5) — T6 


= 75187 — 98010 = 75187 — 61256.25 = 13930.75 


13930.75 — (3737.25 + 5586.375 + 2187.875 + 1752.875) = 666.375 


378 - Analysis of Variance 


TABLE 14.17 Analysis-of Variance of Reduction in Learning Time for 4 À : 
Compositions Studied by 4 Groups Using 4 Methods on 4 Days 


Source of variation Sum of squares d.f. Mean square F 


Total 13930.75 

Methods 3737.25 3 1245.75 5.61 
Compositions 5586.375 3 1862.125 8.38 

Days 2187.875 3 729.29 3.28 g 
Groups 1752.875 3 584.29 2.63 9.28 
Error 666.375 3 222.125 


the analysis of Table 14.17. As in the previous analysis, we shall 
work only with the difference in time required for initial learning 
and for relearning. There are 4 scores for each of 36 students 

or 144 scores in all. A 


36 4 
For all 144 scores we have Y У X? = 15920.5 
29 


36 4 

and УУХ = 990 
TT 

Then the total sum of squares is 


15920.5 — (990)?/144 = 15920.5 — 6806.25 = 9114.25 


To find the sum of squares among the 36 individuals we first b 
compute 


36 4 

PI» X)? = 33? + 55.5° + (-13)* + - - - +33? + 12.5% = 41962 

Then the sum of squares is 

36 4 36 4 
20) xy- E» X)? = 10490.5 — 6806.25 = 3684.25 

To find the sum of шы among groups we first compute 
Gixy = 192? + 243.5% + 310° + 244.5? = 252036.5 
Then the sum of s ON is j 
36 l (252036. 5) – Та 194.76 


E 


&nd the mean square is 3 


= 64.92 
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found, because now we are treating the group mean as the 
of 36 observations and previously we treated it as the mean of 
The sum of squares of individuals within groups is 


3684.25 ~ 194.76 = 3489.49 


and the mean square is $ = 109.05, ‘The 32 degrees 
freedom may be arrived at by noting that from the 35 degrees 
freedom among 36 individuals there must be subtracted 
3 degrees of freedom among the 4 groups, or by noting that t. 
are 8 degrees of freedom for the 9 individuals in each of 


in respect to the variation among individuals. 

In the analysis in Table 14.17, if group differences are not iso« 
lated they will form part of the estimate of error, In Table 14.1 
if group differences are not isolated they will form part of the 
estimate of variation among individuals, which has here 
separated from the error variance. Therefore, in the analysis 
of Table 14.17 if group differences are large it might be impo : 
to isolate them, while in the analysis of Table 14.19, no matter 
how large the differences among groups they cannot affect the 
error variance. Since we are not particularly interested in obtains 
ing & correct estimate of the variation among individuals, there 
із no particular reason for investigating group differences. 

In passing it should be noted that it was necessary to have the 
same number of individuals in each cell, otherwise no solution 
could have been reached. 

The sum of squares among methods is 

207 + Тш WU osos (80625 - 41825 


which is exactly $ of the value previously found. In the same 
mannef, the sum of squares among compositions and among days 
will be exactly $ of their former value. 

The sum of squares among compositions is 


337 628.8 120 t T5! _ 99 _ 620.708 


Unequal Frequencies in the Subclasses · 381 
The sum of squares among days is 
308.5! + 249.5 + 265° + 1771 900" 
1 "murem 


Table 14.19 indicates a significant difference among methods, 
among compositions, and among individuals, but not among days, 


TABLE 14.19 Analysis of Variance of Reduction in Learning Time for 4 Musical 
Compositions When Variations Among Individuals Is Used оз Error Variance 


— ———— ————_—— 
Source of Sum of Mean 
variation squares м. square Р Pa Fa 
Total 0114.25 143 
Methods 415.25 3 1384 33 2.70 3.98 
Com positions 620.71 3 206.9 A94 270 3.08 
Days 243.10 3 81.0 1.93 270 3.98 
Individuals 3684.25 35 1058 24 154 153 
Error 4150.94 99 A9 


سس ~~ 

Unequal Frequencies in the Subclasses. When the frequencies 
in the subclasses are unequal, the computation of sums of squares 
becomes very complex, if an exact solution is sought. A simple 
approximate method will be described. 

The method will be illustrated in relation to data in Table 13.1 
on page 316. The data are based on scores of 90 students in a final 
examination in elementary statistics. The three sections were 
taught by different instructors, Students have been classified by 
section and by sex. Each cell of Table 14.20 contains an entry 
which shows the number of students in the subelass and the mean 
store of those students. 


TABLE 14,20 Number of Male ond Female Students in Each of Three Sections 
in o Course in Statisties ond Mean Score on Final Exomination for Each Svbgrovp. 


ا 
эз Ма GN‏ 
N 2 13‏ 1 
І zx 1432 706‏ 
X "nada РЕ‏ 
N 2 1‏ 
и rr‏ 
ү. uw‏ 
H x M‏ 
ш T a» аю‏ 
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The sum of squares of deviations from subclass means is 


1432*  1240* | 7912, 706? | 134? _ 416 _ a 
240,579 - ( 28 ЖЕШ РТЫ СЕ 13 bagat 7)" 6883.6 — 


subclasses is multiplied by the constant 
Ag + te + тв + та + ‡ + т) = .10720 
In this expression 4 is the reciprocal of the number of subclasses 
and the quantity within parentheses is the sum of the reciprocals 
of subclass frequencies. The mean square for error is now 
(76.48) (.10720) = 8.20 3 


The sums of squares for rows, columns and interaction are now 
computed by treating each mean in Table 14.20 as a single observa 
tion. The computing procedure described in Table 9.13 on page 225 
yields the analysis of variance in Table 14.21. Ж 


TABLE 14.21 Analysis of Variance of Data in Table 14.20 


Source of Sum of Degrees of Mean Р Fs rea 


variation squares freedom square 

Sex 64 1 6.4 78 3.95 6.93 
Section - 99.3 2 49.6 6.05 3.10 4.85 
Interaction 25.9 2 12.9 1.57 3.10 4.85 
Error 90 8.20 


 _—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-—-——-———— ү 

The analysis indicates a significant difference between sections 
but not between sexes. The non-significant interaction indicates 
that sex differences have not been found to vary from section to 
section. 

Two Samples Matched in Subgroups on a Related Trait. Very | 
often the comparison of the means of two samples on a particular 
trait which we may call X is confused by the fact that the groups 
have a noticeably different distribution on some other trait, which 
we may call Y. If X and Y are independent, the difference between 
the groups on Y may be disregarded, but if X and Y are related, 
some method must be found of compensating for that difference. 
For example, suppose a research worker is interested in exploring 
Possible effects upon the emotional adjustment of young children 
of living in an orphanage as compared with living in foster homes. — 
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If there should be a considerable difference in the age distribution 
of the two samples studied, a difference in measure of emotional 
adjustment between the groups which was in whole or part due 
to difference in age might be erroneously interpreted as due to 
difference in home life. A similar effect might arise if the two 
groups were unlike in distribution of intelligence quotients, or as 
to the proportion of children who had been diagnosed as problem 
cases before admission to the institution. However a difference in 
the proportion who were color-blind or left-handed, or who disliked 
spinach might reasonably be disregarded. 

A very common method of eliminating the effect of such a 
difference in extraneous background traits which may be presumed 
to be related to the trait under scrutiny is that of matching 
individual for individual on the background traits. In a study of 
the sort referred to in the preceding paragraph, for example, a 
research worker might attempt to find pairs of children such that 
the members of each pair were of the same sex, of the same age 
within a specified limit, and of the same intelligence quotient 
within a specified limit. The prevalence of defects or of problem 
cases might be eliminated from consideration by limiting the study 
to children not known to be defective or maladjusted before 
admission. Each pair thus obtained is treated as one individual 
for whom there are two measures. The difference between those 
two measures is found, and analysis proceeds as in the problem 
on page 152. : 

There are several serious objections to this procedure. (1) It 
is usually very laborious. The search for closely matched cases 
often takes a very long time and the research worker quite properly 
feels that his energy could be better spent on something else. 
(2) Some of the cases, sometimes an alarmingly large number, 
have to be eliminated because no mates can be found. Thus 
sample size is reduced and reliability sacrificed. (3) Very often 
the cases finally retained at the conclusion of the matching process 
are not representative of either of the original populations. There- 
fore generalization from the sample of matched pairs can be made 
only to a universe of matched pairs and not to the two’ universes 
as originally defined. 

Sometimes, instead of matching case for case, the two samples 
are made to have approximately the same mean and standard 
deviation on the background traits. In this procedure it is not 
necessary to have the two samples of the same size and fewer 
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cases are sacrificed. However the work of matching is still. 
tedious and the samples obtained are usually not representative of 
the original populations. A method proposed by Johnson and 
Neyman’ avoiding all three of the principal disadvantages of 
case by case matching will now be described. . 

The usefulness of this method may be illustrated by application F. 

to a set of data obtained by W. W. Biddle. Не had an experi- 
mental and a control group in each of six different schools, so that 
in effect his research was carried out six times. These groups 
varied in size. Certain environmental factors varied from school — 
to school, but were constant for the two groups in the same school. 
Each group was given a gullibility test designed to measure the 
ability to detect propaganda, the subject matter of this test being 
the Pacific relations of the United States. This test was given 
twice to all subjects, once, a week or more before any experimental 
material was presented to the experimental classes and once, а 
week or more after the close of the experimental teaching. The 
gain in score made by each student is the individual trait studied. 
Between the two administrations of the test, the experimental 
classes studied specially prepared lessons entitled Manipulating 
the Public, in which both subject matter and illustrations of 
propaganda were taken largely from the period of World War I, 
The control groups did not use this material. The data presented 
in Table 14.22 differ somewhat from those in the appendix of 
Biddle's study, being based on original data furnished by him. — — 
The data of Table 14.22 form a two-way layout of k rows 

(6 schools) and 2 columns (2 methods) with disproportionate 


TABLE 14.22 Gain in Measurement of Nationalist Gullibility for an 
Experimental Group and a Control Group in Each of Six Schools.* 


Experimental group Control group Both groups 
School Frequency Mean Frequency Mean Frequency Variance 
Na Xa Na Xa Ni зӣ 

1 51 496 26 074 77 1.024 

2 26 202 35 045 61 856 

3 8 466 29 315 37 410 

4 25 246 12 = 23 37 42 _ 

5 68 241 16 — .245 82 810 

6 _33 450 29 423 62 817 
Total 209 147 856 


* Data from Biddle 2, 
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frequencies in the 2k = 12 cells. The two groups from one school 
may be considered as samples from a pair of populations matched 
with respect to school and environmental influences but differing 
with respect to the experimental variable method. Such data 
are likely to show significant variation among the school means, 
However the focus of interest is not on school differences but on 
method differences. Examination of the means in Table 14.22 
shows that in each school the experimental group gained more 
than the control, so the difference X4 — Ха was positive. Do 
these differences jointly indicate that the true population difference 
is not zero? 

The Johnson-Neyman method is based on three assumptions: 

(1) Within each of the 2k cells the observations are from a 
normal population; 

(2) The population variance is the same for all cells; 

(8) For the two matched populations from one school the mean 
difference is the same as for any other school, so иа — ua = d. 
This assumption is equivalent to the assumption that interaction 
of school and method is zero. 

Since there are two columns, the column difference has only 
one degree of freedom and the test for column difference can be 
made by use of ‘‘Student’s” distribution. We compute 


NaNe 
N: 


(14.12) р == 
=1 
2 

si BT. m‏ و 


i=l 
5 Uoc MOS ENS 
EE 


i=l 


(14.13) зр? = 
(v-k-1) ў у Хаа 
The statistic 2 has “Student’s” distribution with N – К – 1 


degrees of freedom where № is the sum of all the cell frequencies. 
For the data in Table 14.22 
D-.2707 and вр? = .01435 


t= B = 2.26 tors = 1.96 
8р 
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The null hypothesis that there is no method difference may, 

therefore, be rejected. : 39 
An interval estimate for the population difference ш — и made - 

by the methods described in Chapter 7 is t 


C(.035 < m — m < .506) = .95 
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| 5 Analysis of Covariance 


The analysis of covariance is employed in the com- 
parison of groups on one variable when information is available 
on another variable correlated with it, or on several such variables. 

The variable on which the comparisons are made will be 
denoted by the letter Y. The related variables, which will be 
called predictor variables, will be denoted by the letters X and Z. 
The analysis will use regression equations by which the Y values 
are estimated from known values of the predictor variables. The 
mathematical model will be that of regression analysis described 
in Chapter 10, so that the values of the predictor variables are 
the same for all samples in the population, but the Y values differ 
from sample to sample. 

The methods to be described involve a subdivision of the sum of 
cross-products into components similar to the subdivision of sums 
of squares in the analysis of variance. Because the sums of cross- 
products are related to covariance as sums of squares are related 
to variance, methods involving subdivisions of cross-products 
are classified under the general title of analysis of covariance. 

Symbolism in the Analysis of Covariance. To facilitate compu- 
tation we shall introduce a set of symbols formed by the letter C 
with subscripts to be described. Generally we write 
(15.1) C, = D(X — Х) = BX? – (ZX)/N 
(53) Cy = Z(Y - Y)? -ZYt- ZY)/N 
(15.3) Cry = Z(X - Х)(Ү - Y) = ZXY – (ZX)ZY)/N 
Additional subscripts are used to distinguish total variation, 
variation’ within groups and variation between groups. The foi- 
lowing equations relate the usual partition of the total sum of 
squares into portions within and between groups, to the C notation: 


k Ni k 
(15.4) 5 ў (Xia – Х)? = 2 z (X4 – Xy+ у М(Х; - Xy 


(15.5) сеў = б, + Cont 
A similar relationship holds for the sum of ёговв products 
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‹ Ni 4 
ав) $ Зо Du- = ў X Xu- Хы Fo 
+ $ N- IT- 


i=l 


(15.7) Cayr = ©з + Cup 


SEVERAL POPULATIONS WITH ONE PREDICTOR VARIABLE 


In this situation, comparisons are made among means of а 
variable Y in several populations. Information is also available 
оп а variable X. То make the problem concrete, consider the data 
in Table 15.1. These are artificial data so contrived as to exhibit 
vividly the various relationships involved. Three groups, each 
consisting of 5 subjects have been given a prognostic test (X) 
before the beginning of a learning experiment, and an achievement 
test (Y) after the experiment. Each group is taught by a different 
method and is considered as a sample from a population identified 
by that method. We wish to test the hypothesis that the three 
methods are equally effective. 


TABLE 15.1 Prognostic Test Score (X) and Achievement Score (Y) for each of 
Fifteen Subjects in a Learning Experiment Employing Three Methods (1, Il, and m) 


I 
Subject X y 


Subject X x 


1 2 5 11 20 20 

2 4 8 12 18 22 

3 5 7 13 23 26 

4 8 9 14 25 28 
6 1 15 


The three samples have widely disparate means on the achieve- 
ment test (Y; = 8, У, = 10, Y; = 24), and a simple analysis of - 
variance solution applied to the Y scores yields F = 53.0, whereas 
F э is only 3.88. We note, however, that the groups have widely 
' disparate means on X also (X, = 5, X4 = 15, X = 22). We may 
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ask whether the differences among the Y means can be explained 
by the differences among X means. If such an explanation is 
valid, then the methods of instruction must be considered equally 
effective. The analysis related to this problem will be discussed 
in some detail. 

Numerical Computation of C Values. The computations to be 
described will require all the C values listed in Table 15.2. Com- 
puting these from the data of Table 15.1 and setting them down 
in systematic fashion will facilitate the work. The computation 
of several of these will be carried out explicitly as in the illustra- 
tion. The student should verify the others. 


TABLE 15.2 Sums of Squares and Sums of Products Obtained 
from the Data of Table 15.1 


Within Between 
groups > groups 
Ir? Cı = 20 Car = 34 Crs = 34 Cu. = 88 C. = 180 С.т = 818 
Улу Cy =15 Си= 5 Ca = 30 Сы = 50 Crp = 650 Cyr = 700 
Iy Cyr = 20 Си = 26 Cys = 40 С, = 86 Cm = 760 Cyr = 846 
LL ERE о noo f Dig Ta = 


Item GroupI Group II Group Ш Total 


For the data in Table 15.1, 


С. = 1 es = 145 — 125 = 20 


Cr = 3758 – e = 3758 — 2940 = 818 


3 
Cas = У, С... = 20 + 34 +34 = 88 
isl 


Cus = Orar = Cl SIS 870 
APER = Тә + Га AT TU 
- (25) + (T9 + CLIO? _ C) - 3670 - 2940 = 730 
Can = 215 – COCO „215 - 200 = 15 
С» = 3640 — 2100210) .. 3949 — 2940 = 700 
Cryo = p Сы: = 15 +5 + 80 50 
Cayo = Gs — О.у = 650 
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Comparison of Regression within the Groups with А 


groups are the same as that based on the total group, except for 
sampling variability, then the effects of different methods of 
instruction can be disregarded. Conversely, differences betwee 
those regressions not attributable to sampling error may be 
regarded as due to the methods of instruction. 
The regression based on the total group is 


(15.8) Y, = Y +b-(X.—X) 
where 
(15.9) Cur 

by = Can 

The regression for the ith group is 
(15.10) Y.-Yo bi(Xia - X) 
where 
©, 

(15.11) b= ct 


For the data in Table 15.1 we have, using the computations in that 
Table and in Table 15.2, for the regression based on the entii 
group 

T. = 14 +.856(X. — 14) 
and for the regressions based on the separate groups 

Y, = 8 + .75(Xi, — 5) 

T, = 10 4-.147(X4, — 15) 

Pa. = 24 + .882(X,, — 22) 


A test of significance is called for to determine whether he 
differences between the regression equations may be ascribed to 
errors of sampling. Before proceeding with such a test we notice 
that the regression lines in the populations may differ for 
reasons. (1) Their slopes may be different, (2) They may h 
ا‎ - slope, but they may be parallel rather than ac 


Hi: Bı = В, = В, = В. 
The best sample estimate of that common within-groups 
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B. is provided by 


C, 
519 S 
(15.12) b. [ 
For the data of Table 15.1, 
b, = .566, 


If Н, is true, the sample values bı, b; and b, differ only because of 
sampling error. Then substituting b, for each of these in (15.9) 
would produce a regression estimate У, + b,(X;, = X,) differing 
from Y, + b(X;, — Xj) only because of sampling variability. If 


these differences 
CF. tÓ.- XJ - CF. TX. - Хә] 


are squared and summed for all individuals in all groups the result 
is, ater simplification 


е 19 & Сы 0 
E ee 


To make a test of significance this sum of squares can be 
compared with the sum of squares of deviations of individual 
scores from the separate regression estimates, This is found as 
the sum of squares of differences р 


Yia - [Yi + b(Xia- Хд] 
Calling this sum of squares Ss we have, after simplification, 


LI 
(15.14) 8= Ce - Y em 
S,/o* and S,/e* are independently distributed as x" with k — 1 
and N — 2k degrees of freedom respectively. Hence the ratio 


(15.15) rop 


has the F distribution with n, = k — 1 and m = N ~ 2k. 
For the illustrative data 


5 - {5 +5 + 3a е) 100 


s, = 86 (j ++ j 4 
N-2k-9, k-1=2 
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Then 
10.0 9 А 
F = 175 37 95 with m 2 and m = 9, 
so that F is not significant. The regression lines in the three 
populations may be assumed to have the same slope, of which by is 


the best estimate. 
y 


x 


5 10 15 20 25 30 


Fig. 15-1. Paired scores for data of Table 15.1; the regression line for each 
group; regression lines with common slope be through group means; and the 
differences in regression estimates resulting from use of b, instead of by. ~ 

At this point some readers will be trying to visualize S, and 
S. On Figure 15-1 the paired scores of Table 15.1 have been 
indicated by heavy dots. The three regression lines with slopes 
Dı, b; and b; have been drawn through the means of their respective 
groups. Three lines with common slope b, have been drawn 
through those same means. For each individual the difference 
in regression values obtained by using b, or b; is a line segment 
shown on the graph as a dotted line. The sum of the squares 
of all these lines is Sj. To obtain а picture of the line segments 
whose squares constitute S+, you should draw for each individual 
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his residual from the line with slope by. These residuals are drawn 
parallel to the vertical axis extending from the dot which represents 
the paired scores of an individual to the regression line for his 
group. Some of them will overlap the dotted lines already drawn. 

Test of the Hypothesis 8, = 0. It has been established that all 
B's may be treated as equal, but there remains the possibility that 
they are all equal to 0. The test for the hypothesis that B, = 0 is 
provided by the variance ratio 


сы М - 2k 


15.16 F = e 
gio Che Т 
which has the F distribution with n; = 1 and m = N — 2k. 


50* 9 
For our data F- (88)(86) — 50° ә, 1 


Inasmuch as P 5.12 we might decide to ignore the X scores 
and to make our analysis on the Y scores alone, However, even 
if such a course could be defended, with so few cases and F so near 
the .05 value there is considerable risk of making a type-II error 
and accepting the null hypothesis when it is false. Hence, it is 
wiser to retain the X’s in the analysis. 

Measure of Sampling Variability. The sum of the squares of 
residuals from group regression lines divided by its degrees of 
freedom is an unbiased estimate of the error variance. This value 
is 


RE 
(15.17) NOHT 92 31 m yom 

which was used as error variance in (15.15) to test the hypothesis 
8ı = 8» = ۰۰. = By By accepting that hypothesis we established 
that ph may also be considered an estimate of the error variance, 
and therefore that S; and S: may be pooled to obtain an estimate 
with greater number of degrees of freedom: 


(15.18) В, = 81+8, = Coe – Са 


and S,/o? is distributed as x? with N = k — 1 degrees of freedom. 
In subsequent tests S,/(N — k — 1) will be used as estimate of 
error variance. See Figure 15-2 for a graphic interpretation of Sw. 

Test of Hypothesis that a Single Regression Line Fits All 
Populations. If it is true that а single common line fits all of the 


= 4.44 
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populations, the regression estimates based on deviations within 
groups from a line with slope b, [that is, Y; + b,(X;, — X 0] 


Y 


x 


5 10 15 20 25 30 


Fia. 15-2. Paired scores for data of Table 15.1; regression lines with common 
slope b, through group means ; and residuals from those regression lines, 


should be very similar to the regression estimates based on devia- 

tions from the common mean and the regression line with slope br — 

[that is, Y + b,(X, — X)] The differences between these two 
EF: + bu(Xia - X] - [Y T br(Xa ды X) 

would then be due to sampling error. The sum of thé squares of 

these differences may be reduced to 


(15.19) Bs = Cus + Сие _ Сыт 
С zzT 


and S,/c* is distributed as x? with k — 1 degrees of freedom. If 
the hypothesis of a single regression line is true, S,/(k — 1) is 
another estimate of sampling variance and will not be very differ- 
ent in size from S,/(N — k — 1). If the hypothesis is not true, 
Si/(k — 1) is expected to be larger than S,/(N — k — 1), which 
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is a proper estimate of error regardless of whether the hypothesis 
is true or not. Then the quotient 


_ 8, N-k-1 
(15.20) er 


has the F distribution with nı = k – 1 and mn = N – k— 1. It may 
be used to test the hypothesis of a single regression line for all 
populations. For the illustrative data 

700* 


S,- 760 + 9 _ 700" 1894. 
88 818 ~ 


50? 
Sw = 86 — چچ‎ = 57.6 


1894 11 
576 2 


For т, = 2 and л» = 11, Fs is only 7.20. The F ratio indicates 
strongly that the hypothesis of a common regression line should 
berejected. In other words, the differences in achievement cannot 
be wholly explained by differences in original ability but must be 
at least partly ascribed to differences in the effectiveness of teach- 
ing method. 

Total Sum of Squares. An interesting sidelight on the sums of 
squares S, and S; is that their sum is equal to the sum of squares 
of deviations of individual Y values from the regression estimates 
for Y if all groups are combined and the regression line has slope 
b, = car, Hence the relationship 
(15.21) Sr=Se+Ss 


corresponds to a subdivision of the total sum of squares into 
portions as has been described in the analysis of variance. Sr can 
be computed independently of S, and S, by the formula 


Сї, 
(15.22) Sr- Cor — Ge 


F- = 18.1 


$ 700)* 
For the illustrative data S; is 846 — (007 = 246.98 


while S, + Sp = 57.6 + 189.4 = 247.0 


The sums of squares which are components of S; are listed with 
their formulas in Table 15.3. 
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TABLE 15.3 Summary of Components of Sum of Squares of Y about Re- 
gression Line Ya = Y + br(X, — X) with the Formula and the Number of Degrees 
of Freedom for Each 


Nature of variation Symbol Formula Degree 
freedom 

Group regression coefficients 8 j Сы -G id 

about common coefficient е ja бы 
Scores about regression line - Ww Chi 3 

for their own group 8 Cove 2 Ca Ыш 
Group means Ў, about regres- 8 с e yc 

sion line based on means - 1 Br Cae 
Difference between regression 8 С | Cad Cyr 1 

coefficient based on means : ОЗЕ ur 

and common regression coef- | 

ficient within groups 
Scores about regression lines Caye 

with common slope by So = StS: Cn. Fes RT 
Group means about regression См» Сут 

line with slope b, mE Uer ®т1 
Scores about regression line Сут 

for total group Sr Cr — Car NEA 


Regression among the Means. If there is a common regression 
line for all populations, then the pair of means for each population 
must lie on that line. These 
means are X, and u; where 
i= E(Y is the mean of Y for 
the ith population. However, 
even if there is no common re- 
gression linefor all populations 
the means may still lie on a re- 
gression line determined by 
those means as in the adja- 
cent sketch. Here the three 
solid lines have slope b, and 
the dotted line is the regression line for means. This line would 
be estimated by formula 


(15.23) Y;-Y -b(X,- X) 
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УЛ, 
Cazo 


The sum of the squares of the deviations of these means from their 
estimate by (15.23) reduces to 


(15.24) _ where b= 


2 
(15.25) Ss = Cys -9e 


and S;/o? has a x? distribution with k — 2 degrees of freedom. The 
more closely the means cluster around a straight line, the smaller 
8; will be. 

A test of the hypothesis that regression among the means is 
linear is provided by the ratio 
с MERZ 
Sa AB д 
which has the F distribution with m = k — 2 and n = N — k — 1. 
For the illustrative data, 

650? 


S; = 760 – 730 


181.2 11 

= 67.6 ? T - 34.6 
This value is much greater than F.» = 9.65. It must be con- 
cluded that regression among means is not linear (see Figure 15-3). 

Comparison of the Adjusted Means. In the illustrative data 
the differences among the means of Ү(Ү,, Ys, Ys) are in part due to 
differences among teaching methods and in part to differences in 
the X means. To ascertain what the differences among Y means 
are because of differenees among teaching methods alone, the 
effects of differences among X means should be eliminated. This 
elimination is achieved by adjusting all the means to à common 
X value, which, for convenience, may be taken as the mean of 
all X values, X. 

The adjustment is accomplished numerically by subtracting 
from each mean the amount it gains through being associated with 
an X mean-which is above X, or the loss through being associated 
with an X mean below X. The adjustment is b,(X;— X). The 
adjustments are shown in Table 15.4. 

The adjüsted mean is least for Group II and greatest for 
Group III when all X means are adjusted to X. The tests of 
significance have indicated that this difference cannot be explained 
by sampling variability. 


(15.26) 


- 181.2 


and 
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Fic. 15-3. Means of groups indicated by crosses; regression line determin ed. 
by those means; and residuals of means from regression line. 


Sı = 5 (sum of squares of dotted lines). 


Adjustment. 
(X; – X) 


Adjusted Y me 
Y; - .57(Х; = 


MATCHED REGRESSION ESTIMATES. TWO POPULATIONS 
WITH ONE PREDICTOR VARIABLE 


If the categorical trait is dichotomous, a method of analysis 
available which is related to the foregoing but which has two gre 
advantages overit. (1) This procedure does not depend upon 
assumption that 8, = 8ı, and therefore it can be used in situations 


Significance of the Difference Y, — Y, . 399 


where the former would not apply. (2) By this procedure it is 
possible not only to explore the question of whether one of the 
two populations (or methods, or treatments) exceeds the other 
on the average but also to ask for what values of the predictor 
variable it does so. 

In the situation considered here, there is one predictor vari- 
able X, and two populations; the regression of Y on X is assumed 
to be linear but not necessarily the same for the two populations; 
and Y is assumed to be normally distributed with constant vari- 
ance for each value of X. On page 406 we shall consider a similar 
problem with two predietor variables. 

Significance of the Difference Y, — Ў, for a Particular Value ` 
of X. From the data, separate regression estimates for Y can be 
obtained from each group. These are 


(15.27) Yi.=a+bX. and Ў„=а+ЬХ„ 
where b; = сн and a;- Y, - bX; 
zzi 


Now consider a specific value of X which may for convenience be 
called X'. For this value the difference in regression estimates 
for the two groups is 


(15.28) - D = Y,- Ў, = (a > a) + (b — )X' 


The adjacent sketch illustrates the two regression lines and the 
value D for a selected X’. Obviously D is a variable which depends 
upon X. At the point X' the 
Y (1) two groups may be said to be 
matched with respect to X, so 
(2 that a comparison between Ў, 
and Ў;їог X’ is legitimate. If D 
should be zero for the entire range 
of X, then clearly the two regres- 
sion lines coincide and the groups 
areindistinguishable with respect 

to Y. 
x! ^ If the lines cross, the value Xo 
corresponding to the point where 
they cross is the point at which D = 0 for the sample. This point 
will be called the point of nonsignificance. The purpose of this 
analysis is to define a region on the scale of X in which the value 
of D is non-significant. This region will have Xo somewhere 


1 
1 
1 
1 
[ 
' 
L 
1 
1 

X 


0 
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within it, probably near its center. Оп one side of this region 
of nonsignificance there may be a region of significance in which 
Ӯ, > Ӯ, and on the other side a region of significance in which 
Y,» f, Whether there are two, one, or no region of significance 
depends upon the size and distribution of the sample as well ag 
upon the position of the regression lines. If cases are few and the 
lines close together the region of non-significance may extend over 
all the observed values of X. 

The variance of D depends upon the variability of the groups, 
the size of N; and Ne, the relation between X and Y, and the 
distance of X’ from X, and from X;. That variance is given by 


the formula 
2 / 2 Lu * 
| Aa - elo * $8 75 


(15.29) sp? = NANET 
For a fixed value of X’ the ratio 
2 
8D 


can be calculated and its significance determined by reference to 
"Student's" distribution with №, + Nı— 4= N—4 degrees of 
freedom. 

Region of Significance. The procedure outlined thus far necessi- 
tates calculation of significance separately for each value of X’. 
Sometimes there is such a single value of X in which we are par- 
ticularly interested and for which we wish to explore the difference 
in Ў, and Y, More often we wish an answer to the question, 
“For what values of X is the difference Ў, — Ў, significant?” 
Clearly, if two regression lines differ significantly at one point X’, 
they must differ significantly in some neighborhood near X". 
It is interesting and valuable to determine such a neighborhood, 
which may be called the region of significance. This region will 
now be determined. 

Both D and зр are functions of the variable X. (We shall now 


drop the prime for convenience.) The ratio D has the ¢ distribu- 
D 
tion for each fixed value of X. Hence those values of X for which 


B > te 
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provide a one-tailed region of significance a with positive values of 
Y; – Ӯ, critical; those values for which 

Lae 
8р 


provide а one-tailed region of significance a with negative values 
of Y; — Ў, critical. 
A two-sided region with significance level a is determined by 


2 
values of X for which 2 > 42. We shall now work out the 


two-sided region, but all that is needed to obtain the one-sided 
region is to substitute a for $a and to allocate critical values to the 
appropriate end of the scale. 

The two-sided region of significance consists of those values 


of X which satisfy the inequality 

(15.30) D? — sy, > 0 | 

This is an inequality in X since D = a; — а + (bı — b:)X and it 
may be written 

(15.31) AX?+2BX+C>0 


where 


um a= pth [A (Cm Geer) ал 
(15.33) 5 


a- ge Meer Е) e m 


N-4 
(15.34) 


The value of X for which D = 0 is 


к а — dı 
(15.35) х = how 


Bounding values for the region of significance may be obtained 
by solving the equation AX? + 2BX + C = 0 where the numerical 
values of A, B, and C are obtained from formulas (15.32) to 


(15.34). The solution is 


-B+ VB - AÛ 
(15.36) X^ cBEVES A6 
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If B* — AC » 0, the equation has two solutions and those solu- 
tions are bounding values such that the region of non-significance 
lies between them, the region of significance outside them. If 
B'— AC « 0, there is no region. of significance. It is highly 
unlikely that B? — AC will be exactly zero. 

Warning. Because formula (15.31) has been written with 2B 
instead of B as the coefficient of X, the solution given in (15.36) 
differs slightly from the form which may be familiar to the reader. 

Data reported by Bills * may be-used to clarify the procedure. 
Eighteen slow readers in the third grade were studied for threc 
periods of 30 days each. The first period was a control period, 
the second was the therapy period, the third was a period for 
studying cumulative effects of therapy. All the children were 
given reading tests at the beginning.and end of the first period, 
and intelligence tests during the period, but received no other 
experimental treatment. During the therapy period, 8 of the 
children received play therapy, the other 10 did not. Reading 
tests were given at the end of the therapy period. Nothing was 
done with the children during the third period but they were 
again tested at the end. The variable we shall use as criterion 
will be the gain in reading score during the therapy period. Several 
predictor variables are available such as chronologieal age, mental 
age, initial reading score, but, we shall at this time use only one, 
the gain in reading score during the control period. The raw data 
for the analysis are shown in Table 15.5 and the derived data in 


TABLE 15.5 Gain on Reading Test for Each Child during 
Control and Therapy Periods* 
I. Therapy group II. Non-therapy group 


Child Gain during ^ Gain during Child Gain during ^ Gain during 
control period therapy period control period therapy period 


1 .00 -80 1 .00 20 
2 10 1.35 2 — .30 -- 05 
3 10 .90 3 00 45 
4 — 45 45 4 33 15 
5 — .20 :65 5 — 33 .28 
6 25 2.05 6 .00 20 
7 25 1.45 7 — .60 45 
8 .20 45 8 — 45 15 

9 — .20 45 

10 — 35 25 


* Data taken from Bills, В. E. 


j 
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Table 15.6. Here X is used to represent gain during control period 
and Y to represent gain during therapy period. 


TABLE 15.6 Statistics Computed from the Data of Table 15.5 
for Problem Using Method of Matched Regression Estimates 


Бес І II 
Statistic Therapy group Non Hp Grup. 
N; 8 10 
Т, = ZY 8.10 2.23 
Yi 1.0125 223 
Т. = ZX 25 — 1.90 
X; 03125 — 19 
Сы 4197 .6718 
Cys 2.2038 .1986 
Ci .6044 — .0592 
bs 1.4400 — .0881 
а .9675 2063 


SN EN REE 
On Figure 15-4 the 8 small circles represent the paired scores 

for the 8 children in the Therapy group. The regression equation 
for this group 

Y, = .97 + 144X, 
is also drawn. The 10 small crosses on the same figure represent 
the paired scores for the 10 children in the Non-therapy group. 
The regression line for this group 

Ӯ, = .24 — .088Х, 
appears to the eye strikingly diferent from the first one. The 
point of non-significance is 
_ .2063 — .9675 
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X, 


and on Figure 15-4 the two regression lines are seen to cross at a 


point for which X = —.50 and Y = .25. й 

Substitution of the statisties (Table 15.6) in Formulas (15.32) 
to (15.34) with æ = .05, yields : 

А = 40, В = 1.06, С = 44, X, = - 5.0, X: = - 22. 

There is a region of significance for X > — .22, and in that region 
Pris afi: The other region of significance, for which X <-50 
is of no practical interest because it is entirely outside the range of 
the observed data. It may be concluded that children who gained 
during the initial period or lost not more than .22 will make higher 
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gains in a subsequent period if they have therapy. For other 
children, the evidence does not indicate whether they will do 
better under therapy or not. 


Soll oll ell аа а а ра ра ъа ро № 


Y=Gain during therapy period 
HOLN WEA бо оон Юю WALD DOOM 


6-5 “4-321 0 1 2 3 A4 
X= Gain during control period 


Fig. 15-4. Regression of gain during therapy period on gain during 
control period for two groups of children. (Data from Bills.3) 


o = child in therapy group 
x = child in non-therapy group 
Xo = —.50 point of non-significance 
X > —.27 is region of significance favoring therapy group 


SEVERAL POPULATIONS WITH TWO PREDICTOR VARIABLES 


This situation is a natural extension of that described on 
page 388 in which there was a single scaled predictor. However, 
because the regression equation now involves two predictor 
variables, changes must be made in the computational pattern 
and in the degrees of freedom. - 

The total sum of squares $ which is to be analyzed is the sum 
of the squares of the residuals Y. — Ў, where 


Y.-Y-c5.X.- X) eZ. - Z) 
This is the equation of a regression plane as discussed in Chapter 
13. The three means Y, X and Z are the means of the total group. 


The two regression coefficients are also computed for the total 
group. (Here b, and b, have been used as abbreviations for byz.s 
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and б.) In а problem in analysis of covariance it is not usually 
necessary to compute the regression equation but only to obtain Sp. 
If it were desirable to compute the regression coefficients, that 
could be easily done by the Formulas 


(16.37) ber = 


which are algebraically equivalent to Formulas (13.2) on page 319. 
The total sum of squares computed as 


бб — Ситбшт and bır = Cerler = CyerCssr 


zaT'vsiT T zm Тит — Cur 


(15.38) S, = Cure Curr С zar — 2CyerCyerC rer 
zzT'VairT C wer 
has N — 3 degrees of freedom. 

As on page 395, the total sum of squares may be separated 
into two components S, and S; and each of these separated into 
two others: 

Sw = 8, + Ss 

So = Ss + 8, 

Sr = Sy So = Si + Ss + Ss + S, 
Under the null hypothesis, each of these component sums divided 
by its respective degrees of freedom is an unbiased estimate of 
the common variance. The formulas for computing the various 
sums and the degrees of freedom associated with each are shown in 
Table 15.7. 


TABLE 15.7 Summary of Components of Sum of Squares of Y-about Re- 
gression Plane Y, = Y + byr(Xa — X) + byrlZ. — Z) with the Formula and De- 
grees of Freedom for Each 


Degrees of 
Symbol Formula f dy es 
ChyrCur + CyrCrer — 2CrytCzrCyer N-3 
& бет Cate — Cur 
re — 2C ryu suyo ie 
Se yyy = CD Dunn Онно N-k-2 
Ss Sr — S, k-1 
V {Cest Се — 2CryiCseiC ya & 
8, CE У С.С. ыы D Й yat N — 3k 
=1 
Sı Su —S: 2(k — 1) 
go СС» + Сб.» — 2C: C aC КЗ 
S С sias pc RU CTT ХЛИ баб» EY, m 


8, Suez و8‎ : 
FLEENESR uo 5 = е Sobe k 
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The sums of squares listed in Table 15.7 are analogous in _ 
meaning to those in Table 15.3 except that the phrase “regression 


plane" is to be substituted for “regression line." The tests of 


hypotheses are also analogous. 


MATCHED REGRESSION ESTIMATES. TWO POPULATIONS _ 
WITH TWO PREDICTOR VARIABLES p^ 


The procedures and the rationale employed in this situation __ 


are analogous to those described on page 399 except that (1) the 
computing formulas are somewhat more involved, (2) the figure _ 
is three dimensional, and (3) the region of significance is not & 
line segment but a portion of a plane. 

Regression estimates can be calculated for each group: 


Tn =h + baX, T 0.2, and Ye 7 (l9 + baX, +022. 


where 

i en OO 
(15.39) jim eae nm 
(186) b, = Cui Cei = Сыбы 


Sod Oei ar Сш 
(15.41) а= Yi- baX; — Su, ; 
For each pair of fixed variables, X’, Z’, the significance of the _ 
difference 
(15.42) D = (а — aj) + (ba — bes) X" + (b — 22) Z' 
may be tested. 

The variance of D is 


(15.43) в = р СВ 

where 
2 

(15.44) Р = y (Cys Oba — С,„Ф.:) 
=1 

and 


М+М 
(15.45) m "AST : 
2 Сыбы {© —Xy 20,0 = YQ — Z) a = 24 

zzi Ur am 


1 СС Сі C. 


zzi Сы Cai 


The ratio t = 2 has “Student’s” distribution with N — 6 degrees 
of freedom. 
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Region of Significance. From the inequality D? — вру, = 0 
a region of significance a may be obtained by a procedure analogous 
to that on page 400, except that the region which, in that dis- 
cussion, lay on the X-axis, now lies in the XZ plane. From this 
inequality, the limits of the region of significance may be found 
by drawing the curve which is given by the second degree equation 


(15.46) AX? + 2BXZ + CZ! + 2EX +262 +H =0, 
where 


Ps Cai 
15.47 = (bai Ба) ien e сатке 
(15.47) A = (On ba ее 


І В.Р Ф Сы 
(15.48) B = (ba = ba)(ba — ba) t ag) Gog O 


GP Сы 
= - 2 
(15.49) С (r a eG 


= nes = PRP 2 Хб = бш 
(15.50) E = (a; — a) (ba — bes) + Кёб Сыбы, 


& BEE SZ ХШ 
(15.51) G = (а — a3) (bn — ba) + = Û & бб С 


(15.52) Н = (a, — a)? 
„Р [ 2 CAT Xa. 2X 2.044... Z22 N 
xc. ос) TAN 

When the coefficients A to H have been calculated the curve 
which is represented by equation (15.46) may be plotted on the 
X, Z plane. The curve is a conie. The determination of conic 
Sections from a second degree equation is discussed in textbooks 
on plane analytic geometry. 

In order to assist the reader a brief discussion of the procedure 
in plotting the conie will be given. The discussion will not be 
complete, but will cover situations which are likely to arise in 
practice. 

The conic, if it exists at all, will be an ellipse or hyperbola, and 
in general will be centered at a point other than the intersection 
of the XZ axes and will be oblique to these axes. Plotting such 
a conic is fairly laborious but the work can be considerably simpli- 
fied by locating new axes X’, Z’ in relation to which the conic is a 
central conic and determining the equation of the conic in terms 
of X’, Z' coordinates. 
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We shall assume that AC — В? # 0. К 
The new axes intersect at the point (Xo, Zo) with coordinates 


GB-CE 
(15.53) Хо = VORS: 
and 

BE — AG 
(15.54) Z = AC. BF 


This point is to be marked on the diagram. The new X’ axis is 
a line passing through the point Xo, Zo with slope 

- V 2 iE 2 
(15.55) m- С- Ax vVAB' + (А - Cy 

2B 
The sign of the radical in m is to be taken as the same as that of B. 
The X’ axis is given by the equation Z = Zo + m(X — Xo). This 
line should be drawn on the chart and a line drawn perpendicular 
to it at XoZ». 
In terms of the new X’, Z' axes, the conic has the form 


(15.56) A'(X)* + C'(Z)! -H' «0 
where 

(15.57) A’ = (A + 2Bm + Ст?) /(1 + m?) 
(15.58) C' = (Am? — 2Bm + C)/(1 + m?) 


the sign before the radical being the same as that of B, and 
(15.59) H’ = AX + 2BX 2. + CZ? + 2EXo + 2GZ, + H 


Tt does not seem at all likely that in practice Н” will be equal to 
zero, hence we shall assume that it is not zero, 

The type of curve represented by Equation (15.56) can be. 
determined by examination of the coefficients A’, C’, and H' as 
follows: 5 

(1) If A’, C’, and H' all have the same sign, there is no conie 
and no region of significance; 

(2) If A’ and C’ have the same sign, but H’ has the opposite 
sign, there is a region of significance bounded by an ellipse; 

(3) If A’ and C' have different signs, there is a region of 
significance bounded by a hyperbola. 

If the conic exists, it can best be plotted when Equation (15.56) 
is put into the form 


(15.60) e HE» ШҮ 
ENT SG 
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x Group I= Exp. 
o Group II= Control 
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X Intelligence Score+2 


Fic. 15-5. Comparison of two groups on а test of superstitions. 
From Austin Bond * 


Plotting this equation on the original XZ chart would be 
difficult because the new axes are oblique to the lines of the grid. 
However the graph of Equation (15.60) may be plotted on another 
Brid; then the axes of this grid may be aligned with the X', Z' 
axes as drawn on the original grid, and the graph traced on the 


original grid with tracing paper. , ' : > 
In addition to plotting the ċonic which outlines the region of 
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significance, it is desirable to draw the line of non-significance. 
This line has the equation 


(15.61) (a; — аз) + (ба — Б) X + (ba — 54)Z = 0, 


where the a’s and b’s are as given by Formulas (15.39) to (15.41). 

Examples of the method may be found in an experiment 
conducted by Bond? in teaching genetics to college freshmen 
in a course called “Science and Civilization." Experimental 
and control groups had unequal distribution on intelligence as — 
measured by the American Council Psychological Test and unequal 
distribution on the initial application of the various tests employed. 
The symbols and formulas used by Bond follow the pattern in the 
original paper by Johnson and Neyman*. The formulas stated 
in the preceding paragraphs of this text are equivalent to theirs 
but use different symbols and permit an easier method of computa- 
tion. Bond’s study presents eight figures comparing his two 
groups on eight different tests. Some of the critical regions are 
hyperbolas, some ellipses; in some the experimental method fayors 
one type of student, in some another. : 

One test dealt with common superstitions about heredity, the 
Score indicating the number of such superstitions accepted as true. 
Sample data are presented in Table 15.8 and Bond's graph showing 
critical region is reproduced as Figure 15-5. Here intelligence 
Scores are distributed along the horizontal axis; initial scores on 
the superstition test are distributed along the vertical axis; final 
Scores on the superstition test are not shown but are to be imagined 
as distributed along a third axis perpendicular to the other two. 
The experimental method produced better (that is lower) final 
scores over almost the entire range of the observations, and the 
difference was significant for those who had originally accepted 
at least two of the superstitions listed and who had intelligence 
average to low. 

Sample data from a test on opinions on imperialism shown 
graphically in Figure 15-6 present a somewhat différent picture. 
Looking at the line of non-significance we note that regardless of 
the intelligence level the experimental method produced better 
results on the final test for students whose initial test score was 
low while the method used with the control group produced better 
results for students whose initial test score was high: However _ 
such differences are significant only for students in approximately 
the lower third of the distribution of scores and these derive 
significant advantage from the experimental method. 
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x Group I= Exp. Region of Significance a=.01 
o Group H= Control Control. Better 
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Fic. 15-6. Comparison of two groups on a test of opinions on imperialism. 
From Austin Bond. 


TABLE 15.8 Sample Data for Comparison of Two Groups in Regard to 
Superstitions Concerning the Laws of Heredity* 
سے‎ ee 
Experimental Control 
Deu group group 


Number of cases N, = 54 М, = 57 


Sum of scores on final superstitions test ZY = 90 ZY = 176 
Sum of scores on initial superstitions test ZZ = 119 ZZ = 173 
Sumrof scores on intelligence DX = 446 DX = 455 
Mean final test score Y, = 1.667 Y. = 3.088 
Mean initial test score Z, = 2.204 Z, = 3.035 
Mean intelligence score X, = 82.59 X, = 79.82 
Z(Ya — Yi)? Cı = 101.9 Ca = 2804 
Z(Zia — Zi)? Cin = 172.6 Cin = 306.0 
(Xia — X) С.а = 26806 Сыз = 26250 
Z(Ya — Ўд(2; — Zi) Cya = 74.56 Cyn = 217.94 


(Yea — F(X: — X) Cyn = — 416.6. Са = — 813.9 
С.а = — 8884 Са = — 300.4 


(Zia — Z) (X: — X) 


Y = Final score on superstitions test 
Z = Initial score on superstitions test 
X = Score on American Council Psychological Examination 
Large value of Y or Z indicates acceptance of superstitions 


* From Bond? _ 
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16 ке 


In the preceding chapters we have considered inferences 
based on sample moments such as the mean, variance and covari- 
ance, and on functions of these moments. In this chapter inferences 
based on percentiles will be discussed. 

The computation of percentiles is desirable in two classes of 
situations: 

1. When percentiles are actually more satisfactory statistics 
for estimation of population characteristics. For some populations 
the moments do not satisfy the conditions of consistency and 
efficiency as these are described in Chapters 3 and 7, but percen- 
tiles, or functions of percentiles, do satisfy the condition of con- 
sistency and at least approximate the condition of efficiency. 

2. When percentiles are less satisfactory statistics than the 
moments but serve to simplify computations. 

These situations and the related computations will presently 
be discussed in detail. 

Notation for Percentiles. In agreement with the notation 
used in previous chapters, percentiles will be indicated by a symbol 
based on the variable, and a subscript which is the decimal equiv- 
alent of the percentile. Thus X.» is to be the notation for the 
twentieth sample percentile of the variable X, X.; is to be the 
median, etc. The corresponding population percentiles will be 
denoted as Ё s, Ё в, ete. (Eis the Greek letter xi). The symbols X, 
and £, will indicate the (100p)th percentile, because p is the 
decimal equivalent of the percent 100p. We shall need to refer 
to the population ordinate at the (100p)th percentile. This will 
be denoted as yp. i 

ı Quantiles. The term quantile is often used in place of per- 
centile. It differs from the percentile only in the fact that it is 
referred directly to the proportion instead of to the percent 
equivalent of the proportion. Thus X зь, in the present notation, 
is the 25th percentile or the quantile of order .25. The terms 
decile, quartile, and percentile are all subsumed under the term 
quantile. Because the term percentile is in more common use 


that term will be employed here. 
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Distribution ot"Percentiles in Large Samples. In large samp 
X, has an approximately normal distribution with mean £ 
standard error 


1 Fen 
(16.1) y; VLL 


In particular, the median has an approximately normal distribu 
tion with mean ¢.s and standard error 


1 
16.2 
que 2y. VN 
If sampling is from a normal population the ordinate at th 
population median is 1/(ev/2z). Therefore, for samples from 
normal population, the standard error of the median is : 


EE ТА KM 
Moe avn ^ Van = У157 уу 


Consistency of Estimate. From Formula (16.1) it is clear — 
that the standard error of a percentile approaches zero as 
becomes increasingly great. Hence, for large values of N the 

° is little likelihood that the sample percentile will differ greatl 
from the population percentile, and the sample percentile is, there 
fore, a consistent estimate of the population percentile. 

Та sampling from a normal population the mean and median 
are both consistent estimates of u. However there exist popula 
tions such that the sample mean is not a consistent estimate of 
the central parameter but the median is. These populations 
are characterized by the fact that extreme cases are likely to occur 
frequently in samples. 

Efficiency of Estimate. It has been stated that, for samples _ 
from a normal population, both the sample mean and sample 
median are consistent estimates of р. However, for such samples 


the standard error of the mean is —— but the standard error of the 


- VN x 
median is 1.57 Vy . Since the mean has the smaller standard 


, error it is preferred to the median as an estimate of р. 


Similarly in samples from a normal population a consistent — 
estimate of с is given by 


nV, 
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and also by .74(X ;; — X). However, s is the preferred estimate 
because its standard error is smaller. 

In general there may be many consistent estimates of a popula 
tion characteristic. The estimate which has the least standard 
error is called efficient. Other estimates, called inefficient, may, 
nevertheless, be used because of computational convenience. 
When inefficient estimates are used it is desirable to have a measure 
of their efficiency. For this purpose the following measure of the 
efficiency of a statistic has been adopted: 


variance of efficient statistic 


(16.4) Efficiency of a given statistic = Т given Risa 


It is evident from the formula that an efficient statistic has. 
efficiency one, but an inefficient statistic has efficiency less than 
one. When this formula is applied to the median of a sample 
from a normal population: the efficiency of this statistic is 


2 
(16.5) Efficiency of median = SS 64 


An important interpretation of this concept of efficiency is that 
it is a measure of a loss in number of cases. An efficiency of .80 is 
equivalent to sampling reliability based on 80% of the cases in the 
sample. 

Classes of Efficient Statistics. Certain classes of statistics are 
known to be efficient. Among the best known of these are the 
maximum likelihood statistics. These are statistics which, when 
substituted for the parameter in expressions for the probability 
density, make the likelihood of the sample a maximum.. Under 
the usual assumptions, that samples are drawn from a normal or 
binomial population, most of the statistics which have been 
studied in this text, as the percent, mean, variance, correlation 
coefficient and regression coefficient are maximum likelihood 
statistics. These statistics not only have minimum standard 
error but are distributed normally for large samples. 

The maximum likelihood statistics are a subclass of a class of 
statistics known as BAN estimates, which have the common 
property of being normally distributed with minimum variance 
and with the true parameter value as mean. BAN stands for best 
asymptotically normal. For a fuller treatment of these classes 
of estimates the reader must consult the mathematical literature 


of statisties. 
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Estimation of the Mean and Standard Deviation by Per- 
centiles. It has been stated that the median has efficiency .64 as 
an estimate of the mean of a normal population. More efficient 
estimates can be obtained by computing averages of several 
percentiles symmetrically placed about the median. Several 
such estimates together with their efficiencies are listed in Table 
ХУП, in the Appendix. This table shows that very high efficiency 
can be achieved by use of several percentiles. While even five 
percentiles provide an efficiency which is less than unity there 
may be an advantage in estimating a mean by these methods 
because of great saving of time. Such a saving will occur when 
a frequency distribution has data grouped in unequal class inter- 
vals. The method is particularly important when the ends of a 
distribution are open so that the mean cannot be computed at 
all by usual methods. 

Another occasion when the use of percentiles provides a saving 
of time is when data are punched on IBM cards. The cards can 
be sorted in order and the percentiles can be located by a simple 
count. 

When percentiles are used to estimate the standard deviation 
of a normal population the statistics listed in Table XVIII have 
high efficiency. 

The decision as to whether to use an efficient statistic or to 
choose one of lesser efficiency depends upon circumstances. If 
speed is important, estimates based on two percentiles may be 
used, If the number of cases is great, a saving in labor can be 
achieved without too great loss in efficiency by using estimates 
based on four or six percentiles instead of just two. When data 
are scarce or are expensive to obtain and labor of computation is 
less important than labor of obtaining original data, then efficient 
statistics should always be used in order to obtain the maximum 
information from the minimum number of cases. When data are 
cheap and readily available and when computation must-be done 
by inexperienced persons — as is often the case in industry — or 
when the results of computation are needed very quickly, then it 
may be advisable to use inefficient statistics and to compensate 
for the loss of information by taking a larger sample, 


EXERCISE 16.1 


The frequency distribution below is based on scores of 160 persons in 
a mental test of 30 multiple-choice items. Table 16.1 contains estimates 
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of the mean and standard deviation based on X, s and on percentiles. 
The reader may verify the indicated values and may compute estimates 
based on other statistics. The reader should check the accuracy of com- 
putations based on percentiles with that based on statistics which have 
efficiency one. 


TABLE 16.1 Frequency Distribution of Scores of 160 Persons on 30 Items 


X f Estimates of the mean 

28 2 Formula Value 

A 1 Z = 2fX/N 21.012 
Г] 21.115 

P re 3X + Xs) 21.020 

23 21 (Xs + Xo + Xx) 20.993 

22 23 ЖХ Ха Xan + Xa) 20.992 

21 26 Estimates of the standard deviation 

20 19 Formula Value 

19 17 7737 

18 10 хү: — See 

17 9 = 2A с жыды 276 

16 6 N-1 г 

15 1 .339(Х а — Xo) 2.81 

14 1 .171(X s + Х»— X10 — Хш) 279 

13 1 120(Хн + X 0 + X — X. Хао X). 2.76 


Estimation of the Mean and Standard Deviation from Item 
Analysis Data. Concepts analogous to those used in estimating 
the mean and standard deviation of a population from percentiles 
may be used in estimating these population characteristics from 
item analysis data. Answers to each item are classified as correct 
with score 1 or as incorrect with score 0. The item analysis which 
is considered here is based on high and low groups of test scores, 
a test score being the number of items answered correctly. The 
high and low groups are placed symmetrically about the median 
of test scores, as the high and low 50%, or high and low 27%. 
Data are assumed available for all items in the test. 

The discussion will be facilitated by the introduction of an 
appropriate notation: 

N = the number of persons for whom test scores are available. 

k = the number of test items. 

р = the proportion in one of the extreme groups, р < .50. 

Hence the number of persons in one of the groups is pN. 
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Ён; = the number of correct responses to the ith item in the high 
group. 
Rz: = the corresponding number for the low group. 
X ga = | the score of the oth individual in the 
Хш= ) high group and low group, respectively. 
Yp = ordinate of unit normal curve below which the area is 
equal to p. 
In the type of item analysis described above, Ry; and Rz: are 


k 
known for each item. Consider now the sum Y R n; over all items. 
i-1 
This is the total number of correct responses in the high group. 
Now the total number of correct responses in the high group can 
also be obtained as the sum of all scores of individuals in the high 


pN 
group, namely > Xm. Since these two sums are equal to the 


а=1 


same thing 
k »N k »N 
X Rui У Xa. and also Y R,- Y Х,. 
qon «=1 i=l a=1 
The mean scores of the two subgroups are 
ZX ua DX ra 
рү nd “ON 


On the assumption that the scores are distributed normally 
these means are approximations of the corresponding population 
means in large samples, or 


(16.6) M sa Ha 
and 

ZX у 
(16.7) PN =p D с 


The symbol = means approximately equal. 3 
If in (16.6) and (16.7) we substitute for £X 5, and ZX ге their 
equals ZR п: and ZR; and the two formulas are combined we have 


5 + AREN 
2 pN ]—^ 
Hence an appropriate estimate of the mean for large samples is 
УР: + ER 14 
2pN 


(16.8) Estimate of u = 
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: 1 
(16.9) Estimate of o = Ny, (2Ru:- ZR,:] 


When upper and lower halves are used Formulas (16.8) 
and (16.9) yield 


(16.10) Estimate of и = GL E 


(16.11) Estimate of с = 1.254 еы 
When upper and lower 27% are used the estimates are 


ER, ZR; 
N 


ZRgi- ZR; 
N 


(16.12) Estimate of u = 1.851 


(16.13) Estimate of с = 1.512 


The estimates based on items give results which are quite 
accurate if all the items making up the test used as criterion are 
available. If some of the items have been eliminated after a review 
of the item analysis, the remaining item data may still be useful 
for estimation of u and с. The important consideration in deter- 
mining usefulness of the estimates is whether the elimination : 
of items has changed the order of test scores to any great extent. 
If many test scores have been shifted from their original grouping, 
high, low or middle, then the estimates will not be satisfactory. 

It is interesting to remark that the estimate of standard 
deviation based on high-low 50% as shown in Formula (16.11) is 
identieal with the mean deviation of test scores when data are 
available for all items. The mean.deviation is assumed based on 
deviations from the median. 


ZRg; = ХЕ,; ХХ pa- ХХ, 
N 


1.254 = 1.254 N 


Now subtract the median test score from each score on the right. 
This does not affect the value of the estimate since 


N/2 N/2 N/2 N/2 
Y Xm 2 2) Хш = 5) (Хн. cd X s) E iQ = X в) 
«=1 a=1 «=1 a= 


Also (X na — X.) = |. Хн. = X.| 
and Ы) =| Xie = Xw | 
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N 
Y 1X.-Xal 


УВ һ: M ZR, а=1 
р 1.254 T | acs 


Hence 1,254 ——- 


The expression on the right is an estimate of т based on the 
mean deviation. This estimate is known to have efficiency .88. 
Hence .88 is also the efficiency of the estimate of ø in (16.11). 
The estimate of the mean in (16.10) has efficiency one. The 
efficieney of estimates when groups are not upper and lower 
50% is not known to the authors. 

This discussion suggests that the quantity 


Rii = Б 
(16.14) крае 


may be called an indez of reliability of the ith item corresponding 
to the index of reliability suggested by Gulliksen?. Like that 
index, it has the property that when summed over all items it 
provides an estimate of с. The corresponding index of item 
difficulty is 


Rai Ёш 
(16.15) SN 


Estimation of the Correlation Coefficient. If the correlation 
coefficient p of a bivariate normal population is to be estimated, 
the Pearson product moment r is an efficient estimate. Inefficient 
estimates of p will be described in this section.* 

The following procedure provides an estimate of p which has 
efficiency of .52 when p = 0: 

1, Arrange the cases in order of size on one of the variables, 
say X. Select the 27% of the cases for which X is greatest and 
the 27% for which X is least. Discard the 46% remaining cases. 

2. Find the median Y score of the 54% of cases selected in 
step 1. This median may not be the same as the Y median of 
the entire sample. 

3. Count the cases in each of the four groups and designate 
the number in each group as follows: 
^, above Y median of selected cases and in upper 27% of X scores 
m above У median of selected cases and in lower 27% of X scores 
пз below Y median of selected cases and in lower 27% of X scores 
n, below Y median of selected cases and in upper 27% of X scores 


These four numbers are shown diagrammativally on page 421. 
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Low 27% 46% yu "s 


on X Total 
Above Y median 
of selected cases n т m+ т = 27N 
Below Y median 
of selected cases m тц m+ = 27У 
Total na + т = .27N m+n = 277 м Fm + ns + n= BAN 


The estimate of the correlation coefficient is read from 
Chart XV. Аз а first step in the use of this figure compute Xo. If 
M+ Ns > na + т 


Xm Ta Tn; 
Ny + na + Ng + т 
Locate X, оп the horizontal scale of Chart XV and draw a vertical 
line through X, to intersect the curve А = 27. Draw a horizontal 
line from this intersection over to the vertical scale. The point 
where this horizontal cuts the vertical scale indicates the required 
estimate of p. Of course one does not actually draw these vertical 
and horizontal lines on the diagram but notes their position by 
using a ruler or the edge of a card. 
If nı + ns < na + n4 evaluate the ratio 


, Na + Ma 
Ni + n + Ng + т 


and proceed as before, but the estimate is now negative. 

Actually, not only 27% but any percentage will serve the 
purpose. However, the estimate is more efficient when 27% is 
used.’ Chart XV provides a means of estimating p by several 
high-low proportions. In particular, the use of high-low 50% on 
X provides an estimate equivalent to that available from the 
tetrachoric coefficient for the special case when each variable is 
split at the median of the sample. 

The actual proçedure can be carried out conveniently by use 
of cards or by a scatter diagram. If cards are available, the 
procedure can be carried out in the manner described above. 
If a scatter diagram is used, the cases are partitioned as in Figure 
16-1. 

From this figure 


X= 


19 +19 70 


Xo- 94841048" 
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On Chart XV a vertical line through the point X = .70 on the 
horizontal axis intersects the line marked .27 in the point with 
vertical coordinate р = .40. X, = 38/54 = .70 and, therefore, p is 
read from Chart XV as .40. 


Low 27% High 27% 


Fic. 16-1. Scatter diagram of 100 cases 
partitioned for estimation of p. 
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17 Transformation of Scales 


A great deal of the theory presented in previous chap- 
ters was based on the assumption that the variable under con- 
sideration has a normal distribution. When this assumption 
appears dubious, the variable, or scale, may often be transformed 
into another variable with a distribution approximating the 
normal. The formulas for the variance of certain statistics 
involve the unknown parameters of which those statistics are 
estimates. Examples are о; = 19 and с>? = ع‎ E 
a case, it is a great advantage if a related statistic can be found 
with variance dependent only on sample size and not on any 
unknown parameter. Several transformations which serve one 
or both of these purposes will be briefly presented in this chapter. 

The transformation of the correlation coefficient r into 
1 1+7 
2=5 log pcs 
discussed in Chapter 10 and tabulated in Table XII, is an ex- 
ample of a transformation which achieves both purposes. 
Transformation of Proportions into Angles. Because of the 
"unknown parameter in c,* = PQ/N it is often useful to transform 
proportions into angles by the formula 


In such 


, 


(17.1) ¢ = 2 arcsin Vp 
This formula may also be written as 
(17.2), ¢ = 2 sin! Vp 


Both formulas may be read “¢ is twice the angle whose sine is 
ур.” Arcsin and sin“ are discussed in trigonometry texts. 
If $ is expressed in radians its variance is approximately 
(17.3) oe =1/N 
This formula is valid with only slight inaccuracy when 
05 <Р «.95 and N z 20. 


For large samples the acceptable range of P is even greater. 
The value of ф in radians corresponding to a given p other 
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than 0 or 1 can be read from Appendix Table XIXA. Bartlett !4 
has given the formulas 


(17:4) —— ф = 2 arcsin V N forp-0 


(17.5) and — = 3.1416 — 2 arcsin \/ a for p= 1 


The values of фо and ф, can be read from Table XIXB. In testing 
the hypothesis Р = P the statistic 
(17.6) z= VN (2 arcsin Vp — 2 arcsin VP x) 
is useful because its distribution is approximately normal with 
unit variance. A test using this statistic is preferable to the test 
described in Chapter 3 because the distribution of ¢ is more nearly 
normal than that of p. When the observations in an analysis of 
variance problem are proportions, homogeneity of variance can- 
not be assumed because the c, varies with P and with N. If all 
proportions are based on the same N and if each is transformed to 
an angle homogeneity of variance is secured because each angle 
has the same variance, 1/N, even though the proportions differ. 
The Square Root Transformation. A very skewed distribu- 
tion known as the Poisson Distribution which often arises in 
connection with the frequency of occurrence of a very rare event 
has c* = д. To make tests of significance concerning the means 


of such distributions the variable (that is the number of occur-’ 


rences) should be replaced by its square root. 

The Logarithmic Transformation. If the scores in an analysis 
of variance layout are so distributed that the mean of each sub- 
group is approximately equal to its standard deviation, then each 
score should be replaced by its logarithm before the analysis of 
variance is carried out. For example, if the entries are sample 
variances this situation would occur. 

Transformation of Ranks to Normal Deviates. . The use of 
ranks in computing a correlation coefficient was described in 
Chapter 11. It often seems reasonable to assume that the under- 
lying variable represented by the ranks is normally distributed. 
Appendix Table XX may be used to transform a set of ranks into 
a set of scores from a normal distribution in which д = 50 and 
с = 10, for samples in which 5 < N € 30. To find the normal 
equivalent of a rank R when N > 30, first find the proportion 


(17.7) | pet. Е 05 
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then read in Table II the normal deviate z, in a unit normal curve 
corresponding to the proportion p. The corresponding normal 
deviate when и = 50 and с = 10 is 


Z, = 102, + 50 


Normalization of the F Distribution. The F distribution has 
so far been tabulated for only a few significance levels.* When 
a statistician needs to know the significance of a value of F lying 
between these tabulated values, he may take the cube root of the 
observed F and substitute it in a formula given by Paulson * 


(«dd 
(17.8) и = MuR 
Noa а 
The statistic u has a distribution which is nearly unit normal when 
the error variance has at least 3 degrees of freedom. Therefore, 
a good approximation to the significance level for F can be obtained 
by computing и and taking o as the area in the two tails of the 
normal distribution. 
Uniformization. When the form of distribution of a variable 
is not known or is presumed to be not normal, the ordinary tests 
of significance are not justified. Then the scores may be trans- 


formed to ranks and a non-parametric test based on rank order 
may be performed, 
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Most of the methods presented in previous chapters depend 
upon the assumption that samples have beer drawn from a normal 
population. In many problems, this assumption looks quite un- 
reasonable. In this chapter several methods will be described for 
making inferences without any assumption as to the form of dis- 
tribution in the population. Such statistical methods are called 
non-parametric or distribution-free. Examples of non-parametric 
methods which have already been studied are the x? test, the per- 
centiles, and the rank-correlation coefficient. 

Since the methods to be presented are valid for any parent 
population (or, in some cases, for any parent population on a con- 
tinuous variable), they could be validly applied to samples from 
normal populations. Almost always this would be unwise because 
of the loss of efficiency, as this concept was discussed on page 414. 
The tests based on normality assumptions are in general best 
possible tests for samples from a normal population and the use 
of other tests is disadvantageous. 

If the samples are not from a normal population and we use a 
non-parametric test with level of significance a, then for any parent 
population whatever the probability of an error of the first kind 
is actually equal to о or less than о because of discreteness. On 
the other hand, using a normal theory test with level of significance 
o under the same circumstances does not assure that the probability 
of an error of the first kind is controlled at level a. That prob- 
ability may be greater or less than о, depending upon the form of 
the parent population; generally we have no way of knowing the 
direction or degree of departure from a. 


A. Tests for Comparison of Two Samples 


Kolmogorov-Smirnov Test. A two-sample test!? which is 
sensitive to any kind of difference in the distributions from which 
* This chapter was written by Professor Lincoln Moses of Stanford University. 
Because of limitations of space, the manuscript was slightly changed after it left his 


hands, For any errors which may have been caused by such abridgment the authors 
take full responsibility. 


| 
| 
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the samples are drawn, is one based on the sample cumulative 
percentage polygons. If two samples have, in fact, been drawn 
from populations with the same continuous distribution, then both 
cumulative percentage distributions should resemble the parent 
distribution, and should resemble each other. If the distributions 
are too far apart at any point it is cause for rejecting the null 
hypothesis. The test statistic is D = the maximum vertical differ- 
ence between the two polygons. 


TABLE 18.1 Significance Points for Maximum Difference between Two Sample 
Cumulative Distribution Functions (N, = № € 40) * · 


D=k/N 
N Values of & for 
as 05 as 03 as 01 

10 or 11 6 7 

12 6 7 

13 6 7 8 
14 or 15 7 8 
16 or 17 7 8 9 
18 or 19 8 9 
20 — 22 8 9 10 

23 9 10 

24 9 11 
25 — 27 9 10 11 
28 or 29 10 12 
30 — 32 10 12 
33 or 34 11 
35 — 39 11 12 

40 12 


* Adapted from Massey * 


For №, = М» < 40 values of N D calling for rejection at various 
significance levels are given in Table 18.1. For larger values of № 
and N», whether or not they are equal, rejection values can be 
calculated from the formulas in Table 18.2. 

It is important to note that the scale on which the observations 
are made need not have a zero point or even have equal intervals 
in order for this test procedure to be valid; it is only necessary 
that the order of size of observations be reflected in the scale of 
measurement. The reader can see that this statement is reasonable 
by observing that if the horizontal seale along which the scores 
are measured be stretched in some parts, and compressed in others 
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TABLE 18.2 Significance Points for Maximum Difference Between Two Sample 
Cumulative Distribution Functions (№, № > 40) * 


a Value of D so large as to call for 
rejection at level a 

1 122\/ N v з 

05 136\/ М = : 

025 148 

о 1634/1 ae 2 

005 1734/2 A : 


Nı Ne 
001 195 5 


* From Smirnov її 


the maximum vertical distances between the polygons will remain 
unchanged. 

Run Test. Suppose we have a sample of №, observations 
which may be called X and a sample of N, observations which 
may be called Y, and wish to test the null hypothesis that the 
populations from which these samples come have the same dis- 
tribution. The run test has the property that if the hypothesis 
tested at level o is true, then the probability of rejecting that 
hypothesis is œ no matter what the form of the common population 
may be. If on the other hand, the two population distributions 
differ in any way whatever, then the probability of rejection tends 
to one as the two sample sizes are increased without limit. 

The №; + N: observations should be arranged in order of in- 
creasing size. A run in the observations so ordered is defined as 
a sequence of letters of the same kind which cannot be extended 
by incorporating an adjacent observation. Thus in the 21 ob- 
servations below (arranged in increasing order) there are 10 runs: 


ХХХ, Vi X. YYYAYAY, XX, Vs ХХХ, Y; ХХ ҮРУ; 


The X runs are underlined; the Y runs are overscored. If the 
two samples are from a common population the X’s and Y's will 
generally be well mixed and the number of runs will be large. If 
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the X population has a much higher median than the Y popula- 
tion, a long run of Y’s is to be expected at the left end, a long run 
of X’s at the right end, and consequently, a reduced total number 
of runs. If the X's come from a population with much greater dis- 
persion, there should be a long run of X’s at each end, and conse- 
quently a reduced total number of runs. Generally, then, rejec- 
tion of the null hypothesis will be indicated if the runs are too few 
in number. Let v represent the number of runs in the sample. 
If М, > 10 and № > 10, the test of significance is obtained by 
taking » to be normally distributed with mean 


(18.1) m x +1, and variance 


2N,N2(2NiN2 — Ni - №) 
(Ni + №), + Nz — 1) 


In the example given, v = 10 with N, = 11 and №, = 10. 


(18.2) 0°, = 


iem 14 20209) тав 


2(11)(10)(2(11)(10) — 11 — 10} _ VIJ = 2.23 


бун (11 + 10)*(11 + 10 — 1) 
v—yp, 10-1148 
aud fe ie Min gage ВОЛН 


The null hypothesis is not rejected and the two samples may be 
regarded as coming from a common population. 

ince the test is to be used for continuous variables no ties 
should appear; if there are a few ties they should be broken at 
random. Suppose for example that three X’s and two Y’s are tied 
at a common value. To determine where the two values of Y 
are to be placed in this list, enter a table of random numbers, and 
read down a column until you reach one of the digits 1, 2, 3, 4, or 5. 
Suppose the digit 3 is reached first and then the digit 1. Then 
one Y will take position 3 and the other take position 1. The X 
values will take the remaining positions. Then the set of 5 tied 
observations is rewritten as YXYXX. If any tied set consists 
only of X's or only of Y's there is no need to break the tie. An 
alternative procedure is to break the ties in all possible ways and 
for each such way to calculate the number of runs and the asso- 
ciated probability, and then to take the average of these proba- 
bilities for the probability associated with the sample. 
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The run test may be used either where the original data are 
ranks or where the original data have been recorded as measure- 
ments and reduced to ranks in order to perform the test. 

It is important to note that the run test, being sensitive to any 
kind of difference in the two populations is not a very powerful 
test of difference in location, that is, a situation in which one 
population tends to have higher scores or higher ranks than the 
other. If the investigator wishes to test against a particular kind 
of alternative (such as difference of means) he should usually use 
a test designed for this alternative rather than the run test. 
Several such tests will now be presented. 

The Sign Test. The sign test is one of the easiest and most 
widely applicable of statistical tests. It was used in Chapter 1 
of this text. An even number of subjects, say 2N, is divided into 
N pairs, the two members of each pair being as similar as possible 
to each other with respect to certain extraneous variables which are 
not the subject of investigation, but which may affect the observa- 
tions. In each pair of subjects, the choice of which member receives 
experimental condition A is determined by some random device. 

If it seems reasonable to assume that differences are normally 
and independently distributed with common variance and that 
differences are measured on a scale in which intervals are equal, 
the mean and standard deviation of the N differences should be 
computed and a t-test used to test the hypothesis that the mean 
difference is zero (as was done on page 152). When the investigator 
cannot make, or prefers to avoid, these assumptions, he may em- 
ploy the sign test. It tests the hypothesis that the median of the 
population of differences is zero and makes no assumption about 
the form of distribution of these differences. 

The appropriate statistic is the number of differences which 
are positive (ог the ‘number negative). If the null hypothesis is 
true, a sample will usually show about $ М positive and about iN 
negative differences. Clearly the sampling distribution of the 
DO Dd т = Np is the binomial distribution (Q-- P)" with 

“Р= 5. 

For samples of N < 25 the required probability can be read 
directly from Table ТУВ. Suppose d is the number of differences 
of one sign and d < 3 N. For instance suppose there are 5 nega- 
tive and 14 positive differences in a sample of 19 pairs. In row 
N = 19 and column m = 5 of Table IVB the entry is .032. Fora 
two-tailed test, the result would be significant ata = 2(.032) = .064. 
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For larger samples, an approximation to the exact probabilities 
can be obtained by computing x? with one degree of freedom 
(corrected for continuity).. Both theoretical frequencies are 
F;=4N. The values of f; are the sample frequencies shifted by .5 
toward 4 N. If this method had been used for the problem of the 
preceding paragraph we should have shifted the observed fre- 
quencies from 5 and 14 to 5.5 and 13.5; computed x* = 3.368 as 
indicated; and found z = V3.368 = 1.84. Referring to a normal 


E 
f F y 
5.5 9.5 1.684 
1.684 
13.5 9.5 3308 = x? 


probability table we find that for a two-sided test, this is signifi- 
cant at a = .066 which is a good approximation to the exact value 
previously found. In the case of a one-sided hypothesis it is neces- 
sary to remember that rejection of the null hypothesis at level a 
is in order only when x?> x^. and the more frequent sign is 
opposite to the hypothesized direction. A procedure equivalent 
to computing x* corrected for continuity is to compute 


2m +1 VN 
-VN 
VN 
and to regard it as a normal deviate. The 1 is added or subtracted 
in such а way as to change 2m to a value nearer N. Thus for 
m = 5 and N = 19 we should have 


2(5) +1 _ 7 = 2.524 — 4.359 = — 1.835 


as before. 

The assumptions underlying the test are that the differences 
are continuously distributed and independent. There is no as- 
sumption as to the form of the distribution. In particular nothing 
analogous to homogeneity of variance is assumed. The sign test 
is obviously well suited to data for which measurement is difficult 
but judgment between a pair of objects is possible. 

Extension of the Sign Test. If the data are measurements on 
а scale of equal intervals, certain more general hypotheses may be 
tested. For example by subtracting a constant C from every 
difference, that is by taking d's = X4, = Хв = C we can test the 
null hypothesis that the median difference X4 — Xz in the popu- 
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lation is at least C. The hypothesis is rejected for too many d’; 
positive. 

If the data are measurements on a scale of equal intervals and 
if a zero point exists, then by considering the differences 


d'; = X4, — ЕХ», 
we can test the hypothesis 
Н:Р(Ха > ЕХЬ) ك‎ .5 


rejecting it for too many d"; positive. This device сап be used 
only where X is necessarily always positive. 

As these tests are intended to be used with continuous vari- 
ables, ties should occur only rarely as a result of approximation 
in measurement. If a few differences are zero, such pairs are 
omitted from the sample, since they can give no information for 
comparing the two experimental conditions. 

Signed Rank Test for Paired Observations. Like the sign 
test, this test can be used when observations are obtained in 
matched pairs. The hypothesis tested is that the differences be- 
tween observations are symmetrically distributed around a mean 
of zero. As before 2N subjects are divided into N pairs, the two 
members of each pair being matched as nearly as possible. In 
each pair the choice of which member receives experimental 
condition A is made at random. 

The difference between the score under treatment A and the 
score under treatment B in the ith pair will be denoted 


d; = X4, — Xz, 


To perform the test rank the d; in increasing order of absolute 
magnitude. (Thus in the example which follows the difference 
5.4 — 5.3 = .1 is given rank one because in absolute value it is 
smaller than any of the other differences). Each rank is then 
suffixed by the sign of the difference from which it arose. The 
sum of all the ranks with plus signs will for convenience bé called 
the positive rank sum; the sum of all the ranks with negative 
signs will be called the negative rank sum. Under the null hy- 
pothesis these two rank sums should be about equal. If most of 
the differences d; are positive, and the negative ones are small, 
then the negative rank sum will be small and rejection of the null 
hypothesis in favor of E(d;) > 0 is invited. A two-sided test 
requires rejection if either rank sum is too small. Table 18.3 gives 
approximate two-sided significance points for N x 25. If more 
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TABLE 18.3 Significance Points for the Absolute Value of the Smaller Sum of 
Signed Ranks (T) Obtained from Paired Observations * 


6 0 - m 

7 2 0 iz 

8 4 2 0 

9 6 3 2 
10 8 5 3 
11 11 7 5 
12 14 10 7 
13 17 13 10 
14 21 16 13 
15 25 20 16 
16 30 24 20 
17 35 28 23 
18 40 33 28 
19 46 38 32 
20 52 43 38 
21 59 49 43 
22 66 56 49 
23 73 62 55 
24 81 69 61 
25 89 77 68 


at oa аа eh А айыы ед 
* From Wilcoxon * by permission of the author and the 
American Cyanamid Company 
than 25 pairs are involved, then T, the absolute value of the smaller 
rank sum is approximately normally distributed with 


(18.3) ит = NN EN 


(18.4) and бт = ү ваша эш DNNN 41 


The following example will serve to illustrate the computations 
for the signed rank test: 
Par Ха Xs а=Х,—Хв Rankofd Signed rank 


1 76 73 3 3 3 
2 63 5.7 5 4 4 
3 10.3 10.5 - 2 2 2- 
4 62 47 15 8 8 
| 5 54 53 1 1 1 
| 6 93 89 11 6 6 
7 100 91 9 5 5 
8 84 70 14 7 7 


Positive rank sum = 34; Negative rank sum = 2; Smaller rank sum is T = 2. 
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From Table 18.3 we find that a T of 4 or less is significarit at 
a = .05 for the two-sided hypothesis E(d) = 0. If our hypothesis 
was E(d) < 0 then a negative rank total of 4 or less is significant 
at a = .025 (approximately). 

Sum of Ranks. The statistical procedure to be described 
below can be used to test the hypothesis that the Ni values of X 
and the N, values of Y are samples from a common population. 
It is sometimes called a test of difference in location. It guards 
against the one-sided alternatives P(X > Y) < or Р(Х > Ү)> $, 
or against the two-sided alternative Р(Х > У) = 3. 

Consider the two samples shown here, with N; = 9 and N: = 8. 


Group I Group II 
Score Rank Score Rank 
х Y 

11.5 3 15.2 7 
12.6 5 8.6 1 
19.4 13 9.3 2 
213 14 144 6 
32.5 17 15.6 8 
18.6 12 118 4 
17.0 10 16.3 9 
23.4 15 17.8 il 
296 16 E 

Е, = 105 R: = 48 


As a first step, ranks from least to greatest have been assigned to 
the entire N = №, + N, = 17 scores and the sum of the ranks 
obtained for each group separately. The sum of ranks for group 
i will be denoted R;. Аз a check, it should be noted that E, + Rz 


must equal NO ED, In this case 
N(N +1) 1708) 
2 2 
If N, and №, are each as large as 8, or larger, the statistic 
2R; - МАМ +1) 
(18.5) —R——AAA-———-- 
V RANA T 1) 
3 


= 153 and 105 + 48 = 153. 


has a distribution which is approximately unit normal. If R; is 
taken from the first sample we have 


gu 2(105) - 9(18) _ 210 — 162 2548 


Si) ' va 208 
3 


2.81. 
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If E, is taken from the second sample we have 
.2(48) – 8(18) 96 — 144 48 


8(9) (18) v432 208 
Taek 


For a two-sided test, this would be significant at 
© = 2(.0104) = .021, 


and for a one-sided test, at level @ = .0104. If the null hypothesis 
states that the X-median is not greater than the Y-median we 
should reject only when R; is significantly large (that is, Ё is 
small), and vice versa. 

If the number in either sample is small, a table * may be used 
to obtain exact probabilities. 

This test was proposed by Wilcoxon * in the situation in which 
М, = N: and exact probabilities given for small samples. It has 
since been derived by several other people and is usually known as 
the Mann-Whitney Test.” 

Median Test for Two Samples. An even simpler test than that 
based on ranks is arrived at by merely classifying all scores as 
being above or not above the median of the combined samples. 
(When N = №, + М» is an odd number one of the scores is the 
median, and this median score will fall into the class of scores 
“ поё above the median.") A contingency table is now set up and 
X* computed, or if the number of cases is small the exact proba- 
bility may be computed by the method described on page 102. 


Group I Group II Total 
Above median 7 1 8 
Not above median 2 7 9 
9 8 17 


The data of the preceding paragraph with 16.3 as median of the 
combined sample produce the contingency table shown here. 
For samples as small as this, Fisher’s exact method described on 
page 103 must be used. For large enough frequencies x? corrected 
for continuity is the test statistic. 


B. Comparison of k Samples 


Median Test for k Samples. This test is a natural extension 
of the two-sample test described in the preceding paragraph. 
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Suppose that 28 subjects are divided at random into 5 groups 
A, B, C, D, and E, and each group is afforded a different motivat- 
ing rationale for excelling in à maze test. They are then given 
the same maze test and the time required for completion is re- 
corded. Let the time in seconds be: 


A B с р Е 
18.1 16.7 24.7 18.2 124 
24.0 174 36.5 25.9 18.8 
31.7 224 42.1 27.0 19.3 
323 27.1 432 36.6 :22.5 
35.5 35.8 48.7 37.6 35.1 
46.2 50.4 39.8 


The grand median is found to lie between 27.1 and 31.7, therefore 
4 observations in group A are greater than this median, which we 
call X.» In group B only one observation is greater than Xs. 
Continuing in this way we arrive at the following summary: 


A.B. ODI 


Above X. 4 1 5 3 1 14 
Below X.» 2 4 1 3 4 14 
N; бе 6 6 5 


The data exhibit some discrepancy from what might be expected 
under the null hypothesis. To assess whether the discrepancy 
is significant we may treat the data as a contingency table and 
compute x? from the frequencies (which are large enough in this 
example to justify that procedure). The value of x? with 4 de- 
grees of freedom is 6.93 which is not large enough to be significant 
even at level a = .10; so the null hypothesis is not rejected. 

If N, the number of observations in all А samples taken to- 
gether, is odd then one of the observations will be Xs. Consider 
that observation as being below the median. 

This test is based upon the assumption that all popülations 
have the same form of distribution. It is believed, however, that 
the test is not sensitive to differences in population form, 

Sum of Ranks for Comparing k Samples. This extension of 
the two sample comparison by ranks to problems in which there 
are k samples is due to Kruskal and Wallis? The samples are 
merged and a rank of 1 assigned to the lowest score, 2 to the next 
lowest, and so on. Then the sum of the ranks is found for each 
of the k samples. Let N; be the number of observations in the 
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ith sample; Е; the sum of the ranks assigned to the observations 


k 
in that sample; and N = У Ny. The test statistic to be computed 
1 


if there are no ties is 


ER 
(18.6) H- Wii +1) 


If there are ties in rank because two or more observations are 
equal, each observation is given the mean of the ranks which those 
observations would have received had there been no ties, and H 
is divided by 

zT 
N(N? – 1) 


If tis the number of observations in one tied set, T = (t — 1)t(t+ 1) 
for that set. ХТ is taken over all groups of ties. 

If the k samples come from identical populations, and the N's 
are not very small, H is distributed approximately as x* with 


(18.7) 1- 


TABLE 18.4 Computation of Analysis of Variance by Ranks (One Criterion 


of Classification) 
Rank of Individual 
A B c D E 
Score Rank Score Rank Score Rank Score Rank Score Rank 
18.1 4 16.7 2 247 11 18.2 5 124 1 
240 10 17.4 3 36.5 2 259 12 18.8 6 
31.7 15 22А 8 42.1 24 270 13 193 7 
32.3 16 27.1 14 43.2 25 366 21 22.5 9 
355 18 35.8 19 48.7 27 376 22 35.1 17 
46.2 20 _ 4 28 3098 2 4 
Ri 89 46 135 96 40 
Ni 6 6 6 5 
. k Я 
A check is provided by the relation Y R= Xu : 
T 
89 + 46 + 135 + 96 + 40 = 406 
and 2809) „ 406 
EOE m $)- 29) = 111 
H = gio ($ +4 + “+ 3(29) 


and for 4 degrees of freedom = 11.1 
 _———————_ 
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k —1 degrees of freedom. The hypothesis is rejected for large 
values of H. 

By way of illustration let ranks be assigned to the data of the 
preceding section for the times required for running a maze under 
the various motivations A, B, C, D, and E. The ranks and the 
sum of the ranks for the five groups would be as in Table 18.4. 

This result differs from the non-significant result obtained 
with the median test. This is not particularly surprising. The 
present procedure employing ranks utilizes the magnitudes of the 
observations more fully and so may be expected to be somewhat 
more sensitive than a procedure which merely classifies observa- 
tions as above or not above the median. 

Analysis of Variance by Ranks. The analysis of variance by 
ranks is a very easy procedure and does not depend on assumptions 
of normality or of homogeneity of variance. It has the further 
advantage of enabling data which are inherently only ranks to be 
examined for significance. If the Wallis-Kruskal H test is parallel 
to the one-way analysis of variance, then a test due to Friedmann 
is parallel to randomized blocks.* 

Suppose each of m subjects is measured under each of p situa- 
tions (or p treatments); or each of m groups takes each of p tests. 
In such situations, it is often not reasonable to assume homogeneity 
of variance. 


TABLE 18.5 Analysis of Variance with Ranked Data. Two-way Classification 


Subject Score under Treatment Rank of Treatment 
A BE C Кр. А СВИ Gg SD" у 
1 п 1 8 9 20 4 2 3 5 1 
2 12 11 13 #10 18 8 14 24455 t 
3 15.139 "87 9 ш 43 РДМ 5-1 
4 Ж e HM. 10-335 5 3 2 4 1 
T, = Total ae Cites hig Cray 


—————————————————- 


The p treatments are ranked for each Subject, and the ranks 
summed for all subjects as in Table 18.5. The correctness of the 


totals may be checked by noting that their sum must be npo t) , 


which for Table 18.5 is SOG +1) = 60. If the treatments are 
not very different, the rank totals may be expected to turn ont 
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about equal. In the present example there seems to be a marked 
disparity. To evaluate its significance we compute the statistic 


12 
- ———_., 57, 
mp(p + 1) 


which has approximately the x? distribution with p — 1 degrees 
of freedom. xê, is identically equal to m(p ~ 1)W where W is the 
coefficient of concordance discussed on page 284. For the data of 
Table 18.5, m = 4, p = 5 and 


(18.8) х, - 3m(p + 1) 


х = 358 (15° + 11? + 11? + 19? + 4?) — 3(4)(6) 


= 844 — 72 = 12.4 


For 4 degrees of freedom this is significant at the .01 level. 

Actually the values of m and p in this example are so small that 
the x* distribution evaluates the significance somewhat roughly. 
In fact, the 1% point for the test statistic with m = 4, p = 5, is 
10.93; thus the use of the x? table here introduces a certain dis- 
tortion. Generally, if p > 7 and m = 6 the x’ approximation will 
be quite adequate. For values below this a set of data may be 
more significant than the test indicates. If the sample sizes are 
such that 3p 4- m z 20, whether or not they are large enough for 
the x? approximation to serve, another approximation serves well 
to evaluate the significance of the statistic xt, We may take 


(m — 1)x* 
(18.9) F вір 1)-3 
to be distributed as F with 1 

(18.10) m2p-1- 2 degrees of freedom for the numerator 


(18.11) and 
та = (т – م(‎ -1- 2) degrees of freedom for the denominator 


These degrees of freedom are not integers and interpolation in the 
F table may be necessary. In the foregoing example 
. 8024) 872. 
F= RO RI IFA 36 "10? 
m-5-1-41-35 
та = 3(3.5) = 10.5 
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In this example, as in many cases, interpolation is not necessary, 
since the obtained F is larger than F» for nı = 3 and m = 10, and 
is of necessity larger than F.» for nı = 3.5 and m = 10.5. 


C. Confidence Intervals 


Confidence Interval for the Median. In Chapter 16 a formula 
for the standard error of the median was given. This standard 
error may be used to determine a confidence interval for the median 
when the form of population distribution can be assumed as known. 
We shall now describe a method for computing a confidence inter- 
val for the median when no assumption can be made ‘about the 
form of parent population. 

The confidence interval can be developed from the simple 
notion that a random observation is as likely to exceed the popu- 
lation median as to be less than that median. All observations in 
a continuous population may be considered as falling in one of two 
classes (smaller than median, not smaller) with a probability of .5 
for each class. For a sample of observations from any population, 
the number of observations above (or below) the median will have 
the binomial distribution described in Chapter 2, with Р = .5. 

Suppose for simplicity that a sample of two observations is to 
be drawn, the smaller being denoted X;, and the larger X;. Then 
the probability that both X, and X; will be less than the popula- 
tion median is (.5)(.5) = .25. Also the probability that both obser- 
vations will exceed the median is.25. The remaining alternative 
is that one observation will exceed and the other be less than the 
median, and this situation has probability .5. In other words, the 
probability that two random observations will include the popula- 
tion median between them is .50. 

The corresponding computation for a larger number of cases 
will be described in relation to the 11 archery scores of Table 9.8, 
in the column headed First Order. Arranged in ascending order 
these scores are 33, 51, 66, 71, 78, 80, 99, 105, 114, 134, 146. 

Let the observations in ascending order be denoted 


Xi < Xa < Xy... Xy « Xo « Xn 


"The probability that m observations in a sample of 11 cases will be 
smaller than the population median ë.s is given by the appropriate 
entry in row N = 11 of Table IVB. In the column headed 1, the 
entry .006 is the probability that 0 or 1 observation but not more 
than 1 will be less than £p. Similarly in the column 9 the entry 
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.994 is the probability that 9 or fewer observations will be less than 
£50. Hence for any sample of 11 cases 


Р(Х. < E.s < Xy) = .994 — .006 = .988 


Applied to the 11 observed archery scores, this probability becomes 
the confidence interval 


C(51 < £5) < 134) = .988 
The reader may verify by similar methods that 


Р(Х, < £g < Xo | N = 11) = .934 
P(X, < £p < Xs| N = 11) = .774 


and for the 11 archery scores 


C(66 < £x < 114) = .934 
C(71 < £s; < 105) = .774 


Table ТУВ is useful for samples of 25 or fewer. For larger samples 
we use the normal approximation to the binomial with mean 
= NP =.5N and standard deviation VNPQ = .5VN. We take 
the integer next larger than 2:4. (.5VN) and count that many ob- 
servations to the left and to the right from the sample median. 
The observations thus reached are the limits of a confidence inter- 
val with coefficient 1 — a or larger. Thus if о = .05 and N = 36 
the sample median lies between Xıs and Xi, and 


24. (.5VN) = 5.88 


Counting off 6 observations each way from the median produces 
the interval Xi < £5» < Xo. 

Kolmogorov-Smirnov Confidence Band for Cumulative Fre- 
quency. From the cumulative percentage distribution of a sample 
this method enables one to draw a confidence band for the cumu- 
lative percentage distribution of the population." For example 
consider a sample of 10 observations: 


9.1, 10.3, 11.6, 12.4, 12.5, 13.0; 13.6, 14.2, 16.1, and 18.7 


The cumulative percentage distribution for these is shown by the 
solid line in Figure 18-1. In order to construct a band such that 
the cumulative population distribution can with confidence 1 — а 
be asserted to lie within it, we enter Table 18.5 with N = number 
of cases in the sample and obtain a value of ND.. In the row 
N = 10 and column a = .01 we find ND.=5 orD.=.5. Then one 
dotted line is drawn parallel to the sample line in Figure 18-1 and 
0.5 above it; a second dotted line is drawn parallel to the sample 
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ne 
o 


Cumulative Relative Frequency 
ín 


9 10 11 12 13 M 15 16 17 18 19 
Scale of Scores 


Fic. 18-1 


line and 0.5 below it. We may now assert, with confidence .99 that 
the cumulative percentage distribution for the population from 
which our sample was drawn lies inside the band bounded by these 
dotted lines. 


TABLE 18.5 Values of ND,.* D, = Smallest absolute discrepancy between an 
observed and hypothetical frequency distribution which will cause rejection of 
hypothesis at significance level o 


N а5 .05 а5 03 as 01 N «$05 а5.03 as 01 


8 = 4 5 41 9 10 11 

9 4 5 42-47 10 11 
10 5 48 11 12 
11-12 5 6 49 10 12 
13-14 5 6 50-56 10 11 12 
15-18 6 7 57-59 11 13 
19 6 7 60-65 11 12 13 
20-21 6 Ж 8 66—67 11 12 14 
22-24 7 8 68-70 12 14 
25 7 8 71-76 12 13 14 
26-28 7 8 9 84-87 13 14 15 
29-31 8 9 88-91 13 14 16 
32 8 9 92-94 13 14 
33-36 8 9 10 95 14 
37-40 9 10 


ч 96-100 14 15 
Xu 
* Abstracted from Birnbaum! 


„Тһе same method affords a test of goodness of fit. If the cumu- 
lative distribution function specified by some hypothesis about the: 


Ss ee 
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population lies entirely within the confidence band constructed 
with confidence coefficient 1 — о around the sample distribution, 
then the hypothesis is not rejected. If at any point the hypotheti- 
cal curve lies outside the band, the hypothesis is rejected at level о. 

Comparison of Kolmogorov-Smirnov Method with x’. The x? 
goodness of fit test, which is also non-parametric, is applicable to 
either continuous or discrete variables; the Kolmogorov-Smirnov 
test may be applied only to continuous variables. The x? test can 
be applied where only the form of the distribution is hypothesized 
and some parameters remain to be estimated from the sample; 
the K-S test can be used only where the hypothesis specifies the 
hypothetical distribution completely, giving the form and the 
numerical values of all parameters of the distribution. The K-S 
test can be used as in the illustration, with samples too small to 
justify the x? test. The K-S test requires less computation than 
the x? test. Finally, where the Kolmogorov-Smirnov test is appli- 
cable, there are good reasons to believe it is a more sensitive test 
than the x? test. 

Confidence Interval for и. — uy. If the distributions of two 
populations can be assumed to have the same shape, differing 
only by translation, then a confidence interval for the amount of 
translation, that is и. — uy, can be constructed by graphical 
methods. 

Let the scores in one sample be denoted Xi, Xs, . . . Xy, and 
in the other Yı, Ye. . . Ум. It is convenient though not neces- 
sary to have all scores positive with the smallest not much greater 
than zero. To make this transformation, add a constant (positive 
or negative as may be necessary) to each of the №; + N: scores. 
On graph paper, plot the new X-values along the horizontal and 
the new Y-values along the vertical axis. Place a dot at the point 
determined by each (X, Y) pair of values. There will be N,N: 
such points. 

If à 45? line were drawn through the origin, all points to the 
right of that line would have X > Y and all points to the left have 
Y > X. Instead of a 45? line through the origin, two 45? lines will 
be drawn so that a predetermined number of points will lie outside 
them. The X-values of the intersections of these lines with the 
horizontal axis will provide the limits of the confidence interval. 
If N, and N; are both 8 or more and о is not small (e.g., .01 with 
Ni, №, = 8 or .001 with Ni, № 2 15) the number of points outside 
the pair of lines on each side is 
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After U has been computed, take a 45? drawing triangle and con- 


. struct a line at an angle of 45° to the axes, having U of the (X,Y) 


points above it and passing through the (U + 1)st point, counting 
from the left. Construct a second 45° line having U of the plotted 
points below it and passing through the (U + 1)st point counting 
from the right. The intersections of these lines with the horizontal 
(or with the vertical) axis give the confidence interval. 

As an example, consider the data previously used in the dis- 

cussion of sum of ranks with N, = 9, and N, = 8. 

X: 11.5, 12.6, 19.4, 21.3, 32.5, 18.6, 17.0, 23.4, and 29.6 

Y: 15.2, 8.6, 9.3, 14.4, 15.6, 11.8, 16.3, 17.8 
We shall find the 95% confidence interval and also for purposes 
of illustration the 99.9% interval for и, = py. 

The first step is to subtract 8 from each observation; to plot 
the resulting values of X on the horizontal axis and Y on the verti- 
cal and to mark the 72 (X,Y) points. The purpose of subtracting 
8 is merely to produce a more compact chart, with plotted points 
nearer to the origin. The second step is to compute 


i 918) 
U- a (s - 1.96 o) = 15.6 


On the diagram two lines are drawn each excluding 15 points and 
passing through the 16th. (The line at the right passes through 
both the 16th and 17th points.) These are the solid lines in 
Figure 18-2, They intersect the horizontal axis at 1.4 and 10.0, 
80 we write 


CILA $ ps- u, 5 10.0) = .95 


If а is very small Formula (18.12) is inadequate. Then we 
read from White's tables™ the sum of ranks R which is significantly 
small at level a. From this value of R, the number of points to be 
excluded by each 45° line is computed as 


(18.13) о-в - №41) 


where N; is the number of cases іп the smaller sample. For an 
interval with confidence coefficient -999, a = .001 and White's 
table shows R = 40 if N, = 9 and №, = 8, Then U * 40 — 36 = 4. 
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The dashed lines in Figure 18-2 correspond to this value. The 
line at the left excludes four points and passes through the fifth 
and sixth. The lower line excludes four points and passes through 
the fifth. In this case the lower line intersects the z-axis at 14.7, 
and the upper line intersects the y-axis at 3.7; it is clear that this 
line if extended intersects the z-axis at ~ 3.7. We therefore write: 


999 = C[- 3.9 5 д, — д, 5 14.7] 


The justification of this procedure rests on the Mann-Whitney 
test which is described on page 434. 
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Graphic Method of Obtaining Confidence Interval for E(d). 
This convenient procedure devised by John Tukey (unpublished 
paper) is related to the signed rank test for paired comparisons 
described on page 432. The underlying assumption is that the dis- 
tribution of X4, and of X», differ only by translation. 

The differences 4, = Ха, ~ Xn, (taken with regard to sign) are 
denoted by heavy dots on a vertical scale as in Figure 18-3, the 
top point being here marked B and the bottom C. A point midway 
between B and C is found and marked А. On the horizontal line 
through A a point D is marked at some convenient distance. The 
line segments BD and CD form an isosceles triangle. Through 
each dot on the line BC one line is drawn parallel to CD and an- 
other parallel to BD. Each intersection (including those on the - 
vertical scale) is marked with a heavy dot. The number of such 
dots is N(N + 1)/2. 

This construction is simplified either by use of triangular co- 
ordinate paper or by choosing D with the aid of a draftsman's 
triangle in such а way as to facilitate constructing the parallel 
lines with the same triangle. 
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The confidence interval will be an interval on the vertical 
scale. To find it, read from Table 18.3, T,, the largest significant 
value for Т, which is the absolute value of the smaller rank sum. 
To this value add 1. In the example on page 433 which has been 
used for Figure 18-3, T, = 4, and T,+1=5. Find the (Т, + 1)st 
dot (which in Figure 18-3 is the 5th) from the top and draw a 
horizontal line through it. Indicate the intersection of this hori- 


Fic. 18-3 


zontal line with the vertical scale as U.* Now find the (T, + 1)st 
dot from the bottom, draw a horizontal line through it, and indicate 
the intersection of this line with the vertical as L. The points U 
and L are then the endpoints of the confidence interval. In 
Figure 18-8 these points have scale values 1.25 and .15. There- 
fore we can write 


C(.15 < E(d) < 1.25) = 1— .05 = .95 


D. Tests of Independence 


Where each observation consists of a pair of values — say a 
measurement of intelligence and one of scholastic achievement, we 
may ask, “Are the two qualities interrelated, or independent of 
one another?” The student is well acquainted with numerous 
attacks on this problem. If the population of measurements on 
the two variables is bivariate normal then the product moment 
correlation coefficient, r, and the tests of significance discussed in 
Chapter 10 are appropriate. 


* Upper horizontal line has been incorrectly drawn through 4th instead of 5th point, 
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If less is known about the form of the parent population, or if 
the population is known not to be bivariate normal, the scores 
may be replaced by their ranks and Spearman’s rank correlation 
coefficient, R,, or Kendall’s т (page 286) may be calculated and 
tested for significance. With fairly large samples confidence inter- 
vals may be obtained for т. Finally, x? provides a test of inde- 
pendence. This test may be readily applied to a bivariate dis- 
tribution by dividing the scatter-diagram into 4 quadrants by 
lines drawn through the medians of the two-variables. The x? 
test is then applied to the numbers of scores in the quadrants. The 
data need not even be of the sort which permit ranking, but may 
be strictly categorical. For example, the characteristics “ Favor- 
ite sport” and “Occupation” may or may not be related. Any 
statistical test of their independence will have to be one which 
does not rely on order. These three methods (R,, 7, and x^) all 
provide well-known distribution-free tests of the null hypothesis 
of independence between two qualities or characteristics. 

Corner Test of Association. There is another test of inde- 
pendence which is intuitively appealing and easy to perform, 
having as its chief drawback the necessity of drawing a scatter 
diagram." It has the practical advantage that no tables are 
necessary for evaluating significance. The test essentially ignores 
the mass of data near the center of the scatter diagram and is 
based on those observations at the periphery. If association of 
one variable with another in the extreme cases is of central interest 
then there is probably no more suitable test known. 

The application of the procedure is made clear by considering 
an example. Suppose that 18 subjects have the following scores 
on a scale of ethnocentrism (X) and of verbal intelligence (Y): 


Y 32 36 32 37 41 18 43 27 49 47 28 32 46 32 22 45 40 29 
X —11219 1 8 —11412 —425 —3 11 2 6 2 9 —4.6 


Ethnocentrism Score. The points are plotted in a scatter- 
diagram and then a horizontal line is drawn through the Y median 
(У. = 34) and a vertical line through the X median (X » = .4). 
Then the line L, is obtained by placing a ruler horizontally at the 
top of the diagram and moving it down until a point is reached 
which lies on the opposite side of the X median from the topmost 
‘point. The number of points above the line L, is called rı. If 
these points (there is one in the diagram) lie in the first quadrant, 
rı is recorded with a plus sign; if they lie in the second quadrant, 
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with a minus sign. The line Ls is then located by moving а ver- 
tically placed ruler in from the right hand edge until a point is 
reached on the opposite side of the Y median from the point farth- 
est to the right. The number of points to the right of Ls is called 
Ta In the example rs = 1, and a plus sign is affixed because the 
point is in the first quadrant. The lines L, and / are similarly 
constructed, and т; and т, are found by counting. Points in the 
first and third quadrants are counted positively; points in the 
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second and fourth quadrants are counted negatively. The alge- 
braic sum F = ту +rs+r,+r, is the test statistic. Provided 
N & 10 the null hypothesis of independence is rejected at the 5% 
level if r exceeds 11 in absolute value, and at the 1% level if r 
exceeds 14 in absolute value. In the present case the null hypothe- 
sis is not rejected since r = =1+1+3 +1 = 4 only. 
` If the number of cases is not an even number then each median 
line will pass through a point. If it is the same point it may be 
omitted. Otherwise the two points, one lying on the X median 
line and the other on the Y median line, are removed and one new 
point is added; it has the X coordinate of the point which lay on 
the Y median and the Y coordinate of the other. Then the test 
is carried out as before. 
The test is designed for use with continuous populations so ties 
should not arise. If а small number of ties do appear, however, 
they will require special treatment. If two points (or four, or six, 
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etc.) lie on, say, the X median line, then the first is displaced 
slightly to right or left (choose the direction at random) and the 

‚ other is slightly displaced in the opposite direction, Similar in- 
structions apply to pairs of points lying on the Y median line. 
If in constructing one of the L lines one encounters a set of tied 
observations, he draws the line through them all but takes for 
the contribution to т, a number (which may be a fraction): 


БА, را‎ 

+1 
where п, is the number of the observations in the tied set which 
are on the same side of the median as the initial point, and n; the 
number of them on the opposite side. 

It is clear that the test depends chiefly on the extreme observa- 
tions and depends upon the degree to which the data are concen- 
trated in diagonally opposite corners. A study of the diagram 
shows that a point (such as the one in the lower left-hand corner) 
which is an extreme deviate on both the X and Y scales is counted 
twice. This test does not depend on scaling of the values for X 
and Y; only the relative order positions of the X's and Y's are 
utilized. At the same time the procedure does not yield an index 
which measures degree of association in the way that the correla- 
tion coefficient does. 
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Tables and Charts in the Appendix 


Areas Under the Unit Normal Curve 

Percentile Values of the Unit Normal Curve 

Ordinates and Areas of the Normal Curve 

Cumulative Binomial Probabilities. Sum of the First (m+ 1) 
Terms in the Expansion of (Q + P)N 


A. Q = .25 and P = .75 
В. Q =P = .50 
С. Q = .75 and P = .25 


Numerical Coefficients in the Expansion of (Q + Р)" 
Interval Estimate of Population Proportion P with Confi- 
dence Coefficient .95 

95th and 99th Percentile Values of Fmax = 8*max/S’min in a Set 
of k Mean Squares, each Based on n Degrees of Freedom 
Percentile Values of the Chi-square Distribution 
Percentile Values of “Student’s” Distribution 

95th dnd 99th Percentile Values of the F Distribution 
Percentile Values of r for n Degrees of Freedom when р = 0 
1+7 

1-r 

Values of the Product-Moment Coefficient of Correlation 
in a Normal Bivariate Population Corresponding to Given 
Proportions of Success 

Interval Estimate for p with Confidence Coefficient .95 
Chart for Estimating the Correlation Coefficient 
Percentile Values of the Rank Order Correlation Coefficient 
in Samples of N Cases from an Uncorrelated Population 
Percentile Estimates of the Mean 

Percentile Estimates of the Standard Deviation 


Values for Transforming r into z = И loge 


„ Transformation of a Proportion (p) to Radians (Ф) 


Transformation of Ranks to Standard Scores 

Squares, Square Roots, and Reciprocals 

Logarithms 

Random Numbers 

Preregistration Scores of 447 College Students on the Co- 
operative Service English Test 
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Areas Under the Unit Normal Curve 
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TABLE | 
a 1—« 2a 
.500  .500 1.000 
460  .540 .920 
41 .579 .841 
382 .618 .764 
345  .655 .689 
:309 .691 617 
214 .726 .549 
242  .758 | .484 
22 1788 .424 
184 .816 .368 
159 .841 .317 
136 .864 .271 
115 .885 .230 
097 .903 .194 
081 919 162 
.067  .933  .134 
055 .945 110 
045 .955 .089 
036 .964 .072 
029 .971 .057 
023. .977 046 


Фа 


— 21 
— 2.2 
— 2.3 
—24 
— 2.5 
— 2.6 
27 
— 28 
—2.9 
— 3.0 
— 3.1 
—32 
- 33 
= 8.4 
— 3.5 
— 3.6 
= 37 
— 3.8 
— 3.9 
— 40 


21-а 


21 
2.2 
2.3 
2.4 
2.5 
2.6 
2.7 
2.8 
2.9 
3.0 
3.1 
3.2 
3.3 
3.4 
3.5 
3.6 
3.7 
3.8 
3.9 
40 
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TABLE І 


Percentile Values of the Unit Normal Curve * 


— aaħÃ 


Area Area Area Area Area 

to the 2 to the to the to the to the 

left of leftof 2  |leftot — 7  |leftof ^  |leftof 

[4 2 2 2 2 

0001 — 3.719 | 045 — 1.695| 280 —.583| .700 .524 | .950 1645 
0002 — 3.540 | 050 — 1.645 | .300 — .524 | 720 .583 | .955 1695 
10003 — 3.432 | 055 — 1.598 | .320 = „468 | 740  .643 | .960 1751 
0004 — 3.353 | 060 — 1.555 | 340 — .412| 750 .6745| .965 1.812 
0005 — 3.291 | 065 — 1.514 | .360 — .358) 760 .706 | .970 1.881 
1001 — 3.090 | 070 — 1.476 | .380 – .305| .780 772 | .975 1.960 
002 — 2.878 | 075 — 1.440 | 400 – .253 | .800 842 | 980 2.054 
003 — 2.748 | 080 — 1.405 | 420 —.202| .820 .915 | 985 2.170 
004 — 2.652 | .085 — 1.372) 440 —151| 80 .994 | .990 2.326 
005 — 2.576 | 090 — 1.341 | 460 – .100| .860 1.080 | .991 2.366 
006  — 2.512 | 095 — 1311 | 480 – .050 | .880 1.175 | .992 2.409 
007 — 2.457 | 100 — 1.282 | 500 .000| .900 1.282 | .993 2.457 
008 — 2.409 | 120 — 1.175 | .520 .050| 905 1.311 | .994 2.512 
009 — 2.366 | 140 — 1.080 | .540 .100| .910 1.341 | .995 2.576 
010  — 2.326 | 160 — .994 | .560 151 | .915 1.372 | .996 2.652 
015 — 2.170 | 180 — .915| .580 .202 | .920 1.405 |..997 2.748 
020 — 2.054 | 200 — .842 | .600 .253 | .925 1.440 | .998 2.878 
025  — 1.960 | 220 — .772| 620 .305| .930 1.476 | .999 3.090 
030 — 1.881 | 240 — .706| 640 .358| .935 1.514 | .9995 3.291 
035 —1812| 20 — .6745) 660 .412 | .940 1.555 | 9996 3.353 
040  — L751| 260 — .643| 680 .468| .945 1.598 | .9999 3.719 


* Entries in this table are taken from The Kelley Statistical Tables, Harvard University 
Press, 1938, revised-1948, by permission of the author, Truman Lee Kelley. 
zis the percentile value named by the area to the left of z. Thus 2.30 = —.524 is 
the 30th percentile. Discussion of the reading of this table is found on page 34. 
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TABLE Ill Ordinotes and Areas of the Normal Curve * 
(In terms of c units) 


H Ares | Ordinate = Area | Ordinate H Area | Ordinate 
00 +0000 +3989 .50 .1915 43521 1.00 | .3413 .2420 
01 .0040 .3989 „51 „1950 .3503 1.01 43438 ‚2396 
.02 .0080 -3989 .52 .1985 +3485 1.02 .3461 2371 
.03 +0120 .3988 .53 .2019 +3467 1.03 +3485 2347 
04 +0160 .3986 „54 +2054 48448 1.04 | .3508 .2323 
405 .0199 3984 .55 .2088 .3429 1.05 | .3581 .2299 
.06 +0239 «3982 56 +2123 +3410 1.06 .3554 .2275 
407 .0279 .3980 57 .2157 .3391 1.07 +3577 +2251 
408 +0319 .3977 .58 .2190 +3372 1.08 +3599 +2227 
-09 +0359 .3973 .59 .2224 43352 1.09 +3621 +2203 
410 «0398 8970 .60 7.2257 «3332 1.10 | .3643 ‚2179 
1 .0438 +3965 461 .2291 «3312 1.11 | °.3665 .2155 
12 ‚0478 3961 .62 .2324 .3292 1.12 | .3686 ‚2131 
+13 +0517 +8956 63 .2357 .3271 1.13 «3708 .2107 
14 +0557 +8951 +64 2389 +3251 1.14 | .3729 .2083 
„15 +0506 .3945 .65 .2422 .3230 1.15 .3749 +2059 
416 „0636 3939 66 +2454 .3209 1.16 .8770 .2036 
17 «0075 .3932 .67 +2486 +3187 1.17 «8790 +2012 
+18 0714 +3925 +68 +2517 +3166 1.18 +3810 +1989 
+19 +0753 .3918 .69 .2549 +8144 1.19 +3830 +1965 
+20 +0793 +3910 +70 +2580 +3123 1.20 | .3849 +1942 
21 +0832 3902 т .2611 +8101 1.21 +8869 +1919 
.22 .0871 +8894 +72 +2642 .3079 | 1.22 | .3888 +1895 
.28 +0910 +8885 +73 +2673 .3056 1.23 +3907 ‚1872 
24 4.0948 .3876 .74 .2703 .3034 1.24 +8925 .1849 
„25 0987 +3867 75 2734 .3011 1.25 | .3944 +1826 
.26 +1026 . 7 .76 .2764 +2989 1.26 +3962 .1804 
.27 +1064 +8847 T +2794 +2966 1.27 | .3980 .1781 
.28 +1103 +3836 +78 +2823 +2943 1.28 +3997 +1758 
.29 »1141 +3825 479 +2852 +2920 1.29 | .4015 +1736 
+30 ‚1179 .8814 .80 2881 +2807 1.30 | ..4032 41714 
431 41217 «3802 „81 +2910 «2874 1.31 44049 .1691 
+32 +1255 3790 .82 +2939 +2850 1.32 +4066 .1669 
+33 +1293 .3778 .83 +2967 +2827 1.33 44082 +1647 
34 +1331 +3765 +84 +2995 +2803 1.34 | .4099 +1626 
+35 +1368 3752 .85 3023 +2780 1.35 | .4115 .1604 
.36 1406 3739 86 +3051 |. .2756 1.36 44131 .1582 
87 +1443 +3725 87 .3078 .2732 1.37 “4147 .1561 
-38 +1480 3712 .88 +3106 +2709 1.38 | .4162 .1539 
+39 +1517 +3697, .89 +8133 ‚2685 1.89 | .4177 .1518 
40 1554 43083 .90 +3159 ‚2661 1.40 | .4192 ‚1497 
m +1591 +3068 91 +3186 +2637 1.41 44207 -1476 
“2 +1628 +3653 92 +8212 +2613 1.42 | .4222 .1456 
E: .1064 48637 .93 4.8238 „2589 1.43] .4236 +1435 
44 +1700 43621 .94 +3264 +2565 1.44 | .4251 .1415 
.45 «1736 .3605 .95 +3289 +2541 1.45 | .4265 -1394 
446 ‚1772 +3589 .96 +3315 +2516 1.46 | .4279 +1374 
E .1808 +3572 97 3340 .2492 1.47 .4292 +1354 
-48 1844 +3555 .98 .3365 .2468 1.48 | .4306 +1334 
49 +1879 .3538 .99 +3389 42444 1.49 | .4319 41315 
.50 „1916 4.8521 [1.00 3413 +2420 1.50 | .4332 +1295 


* This table is reproduced from J. E. Wert, Educational Statistics, by courtesy of 
McGraw-Hill Book Co. 
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№ а а ра а rra eel 


8 88288 28828 BESER PERPE FUREY BAAS 88388 P8828 SSR TERES 


Ordinates and Areas of the Normal Curve. — (Concluded) 
(In terms of с units) 


2.00 2.50 | .4938 .0175 
2.01 2.51 44940 .0171 
2.02 2.52 4941 ‚0167 
2.03 2.53 4943 .0163 
2.04 2.54 4945 .0158 
2.05 2.55 | .4046 0154 
2.06 2.56 44948 .0151 
2.07 2.57 | .4949 «0147 
2.08 2.58 | ,4951 +0143 
2.09 2.59 4952 0139 
2.10 2.60 |. .4953 „0136 
2.11 2.61 4955, +0132 
2.12 2.62 | .4956 .0129 
2.13 2.63 44957 .0126 
2.14 2.04 | .4959 | ..0122 
2.15 2.65 | .4960 .0119 
2.16 2.66 .4961 .0116 
2.17 2.67 +4962 20113 
2.18 2.68 .4963 .0110 
2.19 2.69 4964 .0107 
2.20 2.70 | .4905 .0104 
2.21 2.71 .4966 .0101 
2.22 2.72 «4967 +0099 
2.23 2.73 „4968 .0096 
2.24 2.74 | .4969 .0093 
2.25 2.75 | .4970 .0091 
2.26 2.76 | .4971 0088 
2.27 2.77 | .4972 .0086 
2.28 2.78 | .4973 4,0084 
2.29 2.79 | .4974 40081 
2.30 2.80| .4974 .0079 
2.31 2.81 .4975 .0077 
2.32 2.82 | .4976 .0075 
2.33 2.83 | .4977 .0073 
2.34 2.84 | .4977 .0071 
.2.35 2.85 | .4978 .0069 
2.36 2.86 | .4979 ‚0067 
2.37 2.87 | .4979 .0065 
2.38 2.88 | .4980 .0063 
2.39 2,89 | .4981 .0061 
2.40 2.90 | .4981 .0060 
2.41 2.91 | .4982 .0058 
2.42 2.92 | .4982 .0056 
2.43 2.93 | .4983 0055, 
2.44 2.94 | .4984 .0053 
2.45 2.95 | .4984 .0051 
2.46 2.96 | .4985 .0050 
2.47 2.97.| .4985 .0048 
2.48 2.98 | .4986 0047 
2.49 2.99 4986 0046 
2.50 3.00 | .4987 .0044 
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TABLE IV | Cumulative Binomial Probabilities 
Sum of First (m + 1) Terms in Expansion of (О + P)N. 
(Decimal points omitted to save space ) 
wan 

i jen 


r=0 


А. Q = .25 and P = 75 


t Discussion of the reading of this table is found on page 57. 
* 1.0 or approximately 1.0. 
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C Q = 75 and P = .25 


TABLE IV (Continued) 


1 2 
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eo 
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11 


12 
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TABLE IV f Cumulative Binomial Probabilities 
Sum of First (т + 1) Terms in Expansion of (О + P)N. 
(Decimal points omitted to save space ) 

т 
[NV 
Ў (е 


r=0 


А. О = .25 and P = .75 


NR | OGRE, акаа Ко T7 УВЫ ONON dy 12:12187 14 15 
ер кы Se ee ee БИЛ у 
5 [031 188 500 812 969 * 

6 |016 109 344 656 801 984 * 

7 |008 062 227 500 773 938 992 * 

8 |004 035 145 363 637 855 965 996 * 

9 |002 020 090 254 500 746 910 980 998 * 

10 |001 011 055 172 377 623 828 945 980 999 * 

11 006 033 113 274 500 726 887 967 994 * * 

12 003 019 073 194 387 613 806 927 981 997 * * 

13 002 011 133 201 500 709 867 954 989 998 * * 

14 001 006 029 090 212 395 605 788 910 971 994 999 * * 
15 004 018 059 151 304 500 696 849 941 982 990 * * * 
16 002 011 038 105 227 402 598 773 895 962 989 999 * * 
17 001 006 025 072 166 315 500 685 834 928 975 994 999 * 
18 001 004 015 048 119 240 407 593 760 881 952 985 996 999 
19 002 010 032 084 180 324 500 676 820 916 968 990 998 
20 001 006 021 058 132 252 412 588 748 868 942 979 994 
21 001 004 013 039 095 192 332 500 668 808 905 961 987 
22 002 008 026 067 143 262 416 584 738 857 933 974 
23 001 005 017 047 105 202 339 500 661 798 895 953 
24 001 003 011 032 076 154 271 419 581 729 846 924 
25 002 007 022 054 115 212 345 500 655 788 885 


T Discussion of the reading of this table is found on page 57. 
* 1.0 or approximately 1.0. 
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TABLE IV (Continued) 
C О = 75 and P = .25 


mi o el. 2 $2 AE EE EUR 


459 


10 


388888888... 


8885 


11 


See tmn 


© 
Ф 
-ї® 


12 


ЕБ... еж, з: 


13 


Pe pe oe 


eoo-2o0oc»omao| Ж 


э ка к кз ка ка кї кш кш ры уы кш уш рш ка ка ка ка ка ра рш 


«© о -а сл RN 


1 
8 1 
36 9 
120 45 
330 165 
792 495 
1716 1287 
3432 3003 
6435 6435 
11440 12870 
19448 24310 
31824 43758 
50388 75582 
77520 125970 


1 
10 1 
55 11 
220 66 
715 286 
2002 1001 
5005 3003 
11440 8008 
24310 19448 
48620 43758 
92378 92378 
167960 184756 


TABLE V* Numerical Coefficients in the Expansion of (О + PN 


Sum of 


Coefficients 
2- 1 
2 = 2 
2 = 4 
2 = 8 
2 = 16 
2 = 32 
2 = 64 
2 = 128 
2 = 256 
2 = 512 
20 = 1024 
Qu = 2048 
22 = 4096 
2% = 8192 
24 = 16384 
25 = 32768 
2% = 65536 
2" = 131072 
2% = 262144 
29 = 524288 
2" — 1048576 


Note: After 2" part of the distribution is omitted, but the sum can be obtained thus: 


28 = 2(1 + 13 + 78 + 286 + 715 + 1287 + 1716) 
24 = 2(1 + 14 + 91 + 364 + 1001 + 2002 + 3003) + 3432 


* Discussion of the reading of this table is found on page 59. 


CHART VI Interval Estimate of Population Proportion P with Confidence 


Coefficient . 


1.0 


95 for N = 10, 15, 20, 30, 50, 100, 250, and 1000.* 


Population Value of P 
л 
do 


۵ 


* Reproduced 
and Pearson, E. 8., 


59 ni 


2 3 4 5 © ‚7 8 9 1.0 
Sample Value of p 


by permission of the authors and the editor from Clopper, C. J. 
“The Use of Confidence or Fiducial Limits Illustrated in the Case 


of the Binomial,” Biometrika, 26 (1934), 404-413. 
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TABLE Vil 95th and 99th Percentile Values of Faar = S nax, s min ina 


39.0 87.5 142, 202 266 
d 199. 448 729. 1036 1362 
154 27.8 393 50.7 62.0 
B 47.5 85 120. 151 184 
9.60 15.5 206 252 29.5 
4 232 37 49. 59. 69 
7.15 10.8 13.7 16.3 187 
5 149 22 28, 33 38 
5.82 8.38 104 121 13.7 
б 11.1 15.5 19.1 22 25 
р 199 6.94 844 9.70 108 
8.89 121 14.5 16.5 18.4 
2 4.43 6.00 7.18 8.12 9.03 
7.50 9.9 117 132 14.5 
5 4.03 5.34 6.31 71 7.80 
6.54 85 9.9 ил 121 
H 3.72 4,85 5.67 6.34 6.92 
5.85 74 86 9.6 10.4 
ЫЗ 3.28 416 479 5.30 5.72 
491 61 6.9 7.6 82 
" 2.86 3.54 401 437 4.68 
407 49 55 6.0 64 
cc du ea O4 
= 2.46 2.95 3.29 3.54 3.76 
3.32 38 43 46 4.9 
s E ee 49 
a 2.07 240 261 2.78 291 
2.63 3.0 33 34 36 
4 1.67 1.85 1.96 204 211 
196 22 23 24 24 
3 1.00 1.00 1.00 1.00 1.00 
1.00 1.00 1.00 1.00 1.00 


1 Reproduced by permission of the author Н. A. David and the Editor of Biometrika, 
Values in the column Ё = 2 and in the rows n = 2 and oo are exact, Elsewhere the 
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Set of k Mean Squares each Based on n Degrees of Freedom f 


ee SS ee 


7 8 9 10 п 12 ^ 
333 403 475 550 626 704 
1705 2063 2432. 2813 3204 3605 
72.9 83.5 93.9 104. 114 124 
216.* 249.* 281* 310:* 337." 361.* 3 
33.6 37.5 411 446 480 514 
79 89 97. 106. 113 120 * 
20.8 22.9 247 26.5 282 29.9 
42 46 50. 54. 57. 60. ? 
15.0 163 175 18.6 197 207 7 
27 30 32 34. 36 37 
118 127 195 148 151 158 z 
20 22 23 24. 26 27 
9.78 10.5 m 117 123 127 P 
158 16.9 179 189 19.8 21 
841 8.95 945 99 103 107 а 
13.1 13.9 147 153 16.0 16.6 
742 787 828 8.66 9.01 гта 
111 118 124 12.9 134 13.9 
6.09 6.42 6.72 7.00 725 Tp ra 
87 91 95 99 10.2 10.6 
4.95 5.19 540 5.59 5.77 BS EE 
6.7 71 73 75 78 80 
3.94 4.10 424 437 449 489. ей 
51 53 55 56 58 5.9 
3.02 312 321 3.29 3.36 3.39 
37 38 39 40 41 42 
217 2.22 2.26 2.30 233 т нз 
2.5 2.5 26 2.6 27 27 
1.00 1.00 1.00 1.00 1.00 OT 
1.00 1.00 1.00 1,00 1.00 1.00 


third digit may be in error by several units. Upper value in each cell is the 95th per- 
centile, lower value is the 99th. * indicates that third digit is uncertain. 
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Percentile Values of the Chi-square Distribution * 


TABLE VIII 


n 


X?. 999 


š 
* 


ё 


AO но 


mM o HO 


оһнооо 
= 


ANHO ORO оо 
NONON (cot. Qo SHAT HN 
с OOTA ооо ооо 
NAGS HONOA HOH HON 

E Cau o часо оч 


f the author, Catherine M. Thompson, and the 


and published with the permission ој 
-» are reprinted abridged from R. A. Fisher and Е. Yates, Statistical Tables, published by 


› Vol. 32 (1941), 


* Abridged from table in Biometrika. 
Columns 2.0, X.ss, and x?. 
iver and Boyd Ltd. by permission of the authors and publishers. 


editor of Biometrika. 
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TABLE X 95th and 99th Percentile 


95тн PERCENTILE IN LIGHT-FACE TYPE, 
= degrees of freedom 


244 

6,022 6,056 6,082 6,106 

18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39 19.40 19.41 
98.49 99.01 99.17 99.25 9930 99.33 99.34 99.36 99.38 99.40 99.41 99.42 
10.13 9.55 9.28 912 9.01 894 888 884 881 878 8.76 874 
134.12 30.81 29.46 2871 28.24 27.91 27.67 2749 2734 27.23 27.13 27.05 
7.71 694 6.59 639 6.26 616 609 604 6.00 596 5.93 591 
@130 18.00 16.69 1598 15.62 15.31 1498 1480 1466 1454 1445 1437 
661 579 541 519 505 495 488 482 478 474 470 468 
16.26 15.37 12.06 1139 10.97 10.67 10.45 10.27 10.15 10.05 9.96 9.89 
54 . К E х © А 406 403 400 
13.74 10.92 9.78 915 875 847 816 810 798 7.87 7.79 7.72 
559 474 435 412 397 387 Е 79 3.73 3.68 3.63 3.60 3.57 
1115 955 845 785 746 719 700 684 671 662 6.54 6.47 
532 446 407 384 369 3.58 350 344 3.39 3.34 3.31 3.28 
1136 8.65 7.59 701 663 637 619 603 5.91 5.88 5.74 6.67 
5.12 426 386 3.63 348 337 3.29 3.23 3.18 3.13 310 3.07 
1056 8.02. 6.99 642 606 580 562 547 5.35 526 518 5.11 
96 4. 5 . 33 3. , i 297 294 291 
10.0 7.56 655 599 564 539 521 5.06 495 485 478 471 
11|484 398 359 336 320 309 зо! 295 290 288 282 279 
9.65 720 622 667 531 507 488 ATA 463 454 446 440 
12|475 388 349 3.26 311 300 292 2.85 280 2.76 272 2.69 
933 693 6.95 541 5.06 482 465 450 439 430 422 416 
13 | 467 380 341 318 302 292 234 277 272 267 263 2.60 
9.07 670 574 520 486 402 444 430 419 410 402 3.96 


оо чо o^ co w 
8 
^ 
= 
A 
a 
& 
a 
8 
» 
Lj 
» 
i" 
ЕЯ 
* 
p^ 
© 
A 
5 


5 
" 
» 
p 
о 
P 
S 
d 
e 
5 
= 
e 
8 
= 
* 
= 
$ 
= 
8 


1 . . . : ы 3 2.70 265 260 2.56 2. 
886 651 556 503 469 446 438 414 403 394 386 3 rj 
2.64 250 255 2.51 


868 636 542 499 456 435 270 400 399 380 373 245 
254 249 245 242 
3.78 3.69 361 3.56 
2.50 245 241 2.38 
3.68 3.59 3.52 3.45 
240 241 237 23 
$60 351 344 3. 


17 | 445 3.59 320 296 281 270 2.62 
3.93 


та = degrees of freedom for denominator 
= 
> 
8 
© 
= 
= 
= 
©з 
= 
ю 
8 
w 
Li 
LJ 
3 


32 3a 88 
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Values of the F Distribution * 


99th Percentile in Bold-Face Type 
for numerator 


3.02 2.98 
5.00 4.92 


2.86 282 
4.60 4.52 


2.74 2.70 
4.29 4.21 


2.60 
405 3.98 
255 251 
385 318 


248 244 
зло 3.62 


243 239 
3 3.48 


2.37 2.33 
$46 3.37 


233 2.29 
3.35 3.27 
2.29 2.25 
3.27 3.19 
а 268 2.21 


2. 
441 


110 


3.67 


2.39 
3.51 


2.70 
4.25 


2.57 
3. 


2. 
3.51 
2.31 
3 


2. 

3.10 
2.15 
3.00 
2.11 
2.91 
2.07 


2.80 
4.51 


па = degrees of freedom for denominator 


* From Snedecor, С. W., Statistical Methods, Iowa State College Press, Inc., by per- 
mission of author and publisher. 
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TABLE X 95th and 99th Percentile 


95TH PERCENTILE IN Licut-Face Туре, 
nı = degrees of freedom 


36 

38 
ы 

40 
E 
© 
8 44 
"o 
3 4 
E 48 
8| 5 
E 
© 55 
$| в 

É 410 362 331 3.09 2.93 2.79 2.70 261 2.54 24T 

1 70 | 3.98 313 274 250 235 223 214 207 201 197 193 189 
r3 7.0 4.92 4.08 3.60 3.29 3.07 2.91 2.77 267 2.59 2.51 2.45 


8 
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Values of the F Distribution* (Continued) 


99th Percentile in Bold-Face Type 
for numerator 


2.08 2.03 197 193 188 184 180 1.76 1.74 | 
183 274 263 255 247 238 233 225 2.21 
206 202 196 191 187 181 178 175 172 
2.80 2.71 2.60 2.52 244 235 230 222 2.18 
205 200 194 190 185 180 173 171 
2.77 2.68 257 249 241 232 219 215 
204 199 193 189 184 179 172 169 
2.74 266 255 247 238 239 2.16 213 
202 1.97 191 186 182 176 169 1.67 
2.0 362 851 242 234 225 212 208 
200 195 189 184 180 174 167 164 
266 258 247 238 230 221 208 204 
1.98 1.93 187 182 178 172 165 1.62 
2.62 2.54 2.43 235 226 217 2.04 2.00 
196 192 185 180 178 171 1.63 1.60 
2.59 2.51 240 232 2.22 3.14 2.00 1.97 b 
195 190 184 179 174 169 161 159 
2.566 2.49 2.37 229 230 211 197 194 a 
194 189 182 178 173 168 1.60 157 E: 
2.54 2.46 3.35 226 2.17 2.08 194 191 
192 188 181 176 172 16% 158 1.56 
2.52 244 232 224 215 206 1.92 1.88 
191 187 180 175 171 165 157 154 
2.50 2.42 2.30 3.22 2.13 2.04 1.990 1.86 
190 186 179 174 170 164 156 153 
2.48 240 228 2.20 211 2.02 1.88 1.84 
190 185 178 174 169 163 155 1.52 


2.46 2.99 2.26 2.18 2.10 2.00 1.86 182 176 


1.88 183 176 172 167 161 158 152 150 146 
248 235 2.23 215 2.06 1.96 190 1.82 25 78 1.71 
186 181 175 170 165 159 156 150 148 144 
240 2.32 220 212 203 193 1.87 179 174 1.68 
1.85 180 1.73 168 163 1.57 1.54 149 146 i 42 
2.37 2.50 2.18 2.09 2.00 1.90 184 176 1.71 
184 179 172 167 162 156 153 147 145 Ы 
135 128 2.15 2,07 198 188 182 174 1.69 14 
1 


182 177 170 165 160 154 151 145 142 1; 
2.32 2.24 2.11 2.03 1.94 184 1.78 1.70 165 1.57 


179 175 168 163 157 151 148 142 139 134 
2.26 2.19 2.06 1.98 1.89 1.79 173 1.64 1.59 1.51 
177 172 165 160 155 149 145 139. 136 131 
3.23 215 203 1.94 185 175 168 1.59 1.54 146 
176 171 164 159 1.54 147 144 137 134 129 125 122 
2.20 2.12 2.00 1.91 1.83 1.72 1.66 156 161 143 137 133 
174 1.69 162 157 152 145 142 135 132 1% 122 119 
217 209 1.97 188 1.79 1.69 162 153 148 139 133 138 
172 167 160 1.54 149 142 138 132 128 122 116 113 
2.12 204 Е 92 1.84 1.74 164 1.57 147 142 182 124 119 
170 165 158 153 147 141 136 130 126 dd 113 108 
2.00 2.00 1.89 181 171 161 164 144 138 119 111 


169.164 157 152 146 140 135 128 124 in 111 100 
2.07 1.99 1.87 1.79 1.69 1.59 162 141 136 120 115 1.00 


* From Snedecor, G. W., Statistical Methods, Iowa State College Press, Inc., by per- 
mission of author and publisher: 


degrees of freedom for denomi 


m= 
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TABLE XI* Percentile Values of г for n Degrees of Freedom when p=0 
———— ——————..... 


7.95 


© очо стою | з 
8 


T.975 


.997 
.950 
878 
E 
754 
707 
666 
632 


T.99 


To 


T.95 Т.9995 n To5 Tom Тю Тоз Гоу 
9999 1.000 || 30 | 2% 409 .449 .554 
990 999 35 | .275 881 .418 .519 
.959 .991 40 | .257 358 .393 .490 
917 974 45 | 243 338 .372 .465 
874 951 50 | .231 322 .354 443 
834 925 55 | .220 807 .338 .424 
798 .898 60 | 211 295 .325 .408 
1765 872 65 | .203 284 .312 .393 
435 . .847 70 | .195 274 302 .380 
.708 .823 75 | 189 264 .292 .368 
.684 .801 80 | .183 256 .283 .357 
.661 780 85 | 178 249 275 .347 
641 760 90 | .173 242 .207 338 
628 742 95 | .168 2386 .260 329 
.606 425 100 | .164 280 .254 321 
.590 .708 125 | .147 -206 .228 .288 
575 693 150 | .134 189 .208 .264 
.561 .679 || 175 | .124 174 194 .248 
.549 -665 | 200 | .116 164 .181 .235 
.587 .652 || 300 134 .148 .188 
515 .629 || 500 04 .115 .148 
496 -607 || 1000 078 .081 .104 
487 .597 058 .074 
7.085 1005 — 7.0005 


* Reprinted abridged from В. A. Fisher and Е. Yates, Statistical Tables, published 
by Oliver and Boyd Ltd. by permission of the authors and publishers, 
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0599 


TABLE XII* Values for Transforming r into z = 3 log, 


.07 
0699 


1+2 


Tor 


+1587 
2543 
+3452 
-4301 
5080 
+5784 
6411 
.6963 
7443 


‚1684 
.2636 
.3540 
.4382 
:5154 
.5850 
:6469 
47014 
:7487 


оол AUR оь | о їз Ankh were 


1 7857 7895 

1 8210  .8243 

1. .8511 .8538 

1 8764  .8787 

: 5977 .8996 

1. .9154 .9170 

1 .9302 .9316 

1. . 19425 .9436 |. 

i : ‘ -9527 .9536 |. . 
1 J .9611 .9619 | . E 
2.0 .9680 .9687 |. 4 
21 .9738 .9743 | < 

2.2 9785 — .9789 

2.3 .9823 .9827 

24 .9855 -9858 

25 .9881 .9884 

2.6 9903 — .9905 

2.7 .9920 .9922 

2.8 :9935 -9936 |. 

2.9 .9946 — .9947 | - 


* Reprinted abridged from R. A. Fisher and F. Yates, Statistical Tables, published 
by Oliver and Boyd Ltd. by permission of the authors and publishers, The figures in 
the body of the table are values of r corresponding to z-values read from the scales on the 
left and top of the table. 
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CHART XIV Confidence belt for the correlation coefficient р when a = .05. 
Numbers on the curves indicate sample size. (Reproduced with the permission 
of ES. Pearson from David, F.N., Tables of the Correlation Coefficient, The 
Biometrika Office, London.) 
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CHART XV Chart for estimating the correlation coefficient. (From Frederick 
Mosteller, “Оп Some Useful ‘Inefficient’ Statistics," Annals of Mathematical Sta- 
tistics, 17 (1946), p. 405. Used with the permission of the author and the editor 
of Annals.) The use of this chart is discussed on page 420. 
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TABLE XVI Percentile Values of the Rank Order Correlafion Coefficient in 
Samples of N Cases from an Uncorrelated Population 


N Ею En Rea Ra Ha Ew Вах Ry Ew Rø 
4 1.00 1.00 = == c ES E — = 
5 .80 .80 .90 .90 .90 90 .90 -90 .90 1.00 
6 

{ 

8 


47 EE ЁТ. 83 83 83 89 94 1.00 
.68 608 л 71 75 45 79 82 86 89 
63 64 167 67 69 71 74 76 81 .86 


— Rs — RE» — Ва — Ro — 


Ros — Ro — Ru ZRo — Re — Ro 


TABLE XVII Estimates of the'Mean of a Normal 
Population Based on Percentiles.* 


Estimate Efficiency 
X0 .64 
+ Xa) 81 
(Хз Xd X) .88 
IC m + Xos + Хо + Xas) 91 
(Хо + Хо + X + Xm + Xa) 93 


* Based on Mosteller, F., “On some useful 


inefficient statistics,” Annals of Mathe- 
matical Statistics, 17 (1946), pp. 377-408, 


TABLE ХҮШ Estimates of the Standard Deviation of а Normal 
Population Based on Percentiles,* 


Estimate Efficiency 
-339(X 93 — Xs) 65 ( 
171K. + Xa — Xo — Хы) 76 | 
J20(X a + Xo Xo- Xy- Xa — Xo) 87 
* Based on Mosteller, F., “On some useful inefficient, Statistics," Annals of Mathe- | 
matical Statistics, 17 (1946), pp. 377-408, 
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А. TABLE XIX * Transformation of a Proportion 


(р) to Radians ($) [ф = 2 arcsin УР] 


p $ p Ф p Ф Р Ф p Ф р Ф 


001 .0633 | .036 .3818 | 26 1.0701 | 61 1.7826 
:002 .0895 | .037 .3871 | 27 1.0928 | .62 1.8132 
:003 .1096 | .038 .39924 | ‘og 1.1152 | 63 1.8338 
.004 .1266 | .039 .3976 | 29 11374 | 64 1.8546 
005 .1415 | .040 .4027 | 30 14593 | .65 1.8755 
:006 .1551 | .041 .4078 | 31 11810 |.66 1.8965 
007  .1675 | .042 .4128 | 32 1.2025 | .67 19177 
008 .1791 | .043 .4178 | 33 1.2239 | 68 . 1.9391 
009 .1900 | .044 .4297 | 34 1.2451 | .69 1.9606 
010  .2003 | .045 .4275 | 35 1.2661 | .70 1.9823 
011 .2101 | .046 .4323 | .36 1.2870 | .71 2.0042 
012  .2195 | .047 .4371 | 37 1.3078 | .72 2.0264 
013 .2285 | .048 .4418 | 38 1.3284 | 73 2.0488 
014 .2372 | .049 .4464 | 39 1.3490 [.74 20715 
015 .2456 | .050 — .4510 | 40 1.3694 | .75 20944 
016 .2537 | .06 4949 | 44 1.3898 | .76 2.1177 
017 .2615 | .07 -5355 | 42 1.4101 | .77 21412 
018 .2691 | .08 :5735 | 43 1.4303 | 78 2.1652 
019 .2766 | .09 6094 | .44 1.4505 | 79 2.1895 
020 .2838 | .10 6435 | .45 1,4706 | .80 22143 
021 .2909 | .11 -6761 | 46 1.4907 | .81 2.2395 
022 .2978 | .12 -7075 |. 47 1.5108 | .82 2,2653 
023 .3045 | 13 41377 | 48 1.5308 | .83 . 2.2916 
024 .3111 | .14 -7670 | .49 1.5508 | .84 23186 
025 .3176 | .15 -7954 | .50 1.5708 | 85 2.3462 
026 .3239 | .16 .8230 | .51 1.5908 | .86 2.3746 
027 3301 | .17 -8500 | .52 1.6108 | .87 2.4037 
028 .3363 | .18 -8763 | .53 1.6308 | .88 24341 
029 .3423 | 19 9021 | .54 1.6509 | .89 2.4655 
030 .3482 | .20 9273 | .55 1.6710 | .90 24981 
031 .3540 | 21 9521 | .56 1.6911 | .91 — 2.5319 
032 .3597 | .22 9764 | .57 1.7113 | .92 25681 
033 .3054 | .23 1.0004 | .58 = 17315 93 2.6062 
034  .3709 | 24 1.0239 | 59 1.7518 | .94 2.6467 
035 .3764 | .25 1.0472 | 60 1.7722 | .950 2.6906 
B. Values of $o and ф, for Samples of 10 to 50 * 
Sample size $o Фф. Sample size 
10 3176 2.8240 30 
11 .3027 2.8389 31 
12 .2897 2.8519 32 
18 .2782 2.8633 33 
14 2081 2.8735 34 
15 .2589 2.8827 35 
16 .2507 2.8909 36 
17 .2431 2.8985 37 
18 .2363 2.9053 38 
19 .2299 2.9117 39 
20 .2241 2.9175 40 
21 .2187 2.9229 41 
22 2136 2.9280 42 
23 .2089 2.9327 43 
24 2045 2.9371 44 
25 .2003 2.9413 45 
26 .1964 2.9452 46 
27 1927 2.9488 4T 
28 1893 2.9523 48 
29 1860 2.9556 cn 


* From Techniques of Statistical 
Churchill Eisenhart, and the publish 
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2.6952 


-986 2.9044 
:987 2,9131 
988 2.9221 
989 2.9315 
-990 2.9413 
-991 2.9516 
-992 2.9625 
993 2.9741 


-994 2.9865 


:995 3.0001 
:996 3.0150 


-997 3.0320 


-998 3.0521 
.999 3.0783 


* Discussion 
of the use of 
this table is 
found оп 


:985 2,8060 | page 423 


$i 


2.9588 
2.9617 
2.9646 
2.9673 
2.9699 
2.9724 
2.9747 
29770 
2.9792 
2.9813 
2.9833 
2.9853 
2.9871 
2.9889 
2.9907 
2.9924 
2.9940 
2.9956 
2.9971 
2.9986 
3.0001 


Analysis by permission of the author of Chapter 16, 
er, the McGraw-Hill Book Co. 
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TABLE XXI | Squares, Square Roots, and Reciprocals 


Ul 14d oT a _——— 
ni Yn ViOn 1/n n п? Уп ViOn 1/n 


1 | 1.000 3.162 | 1.00000 51 2601 7.141 | 22.583 | .01961 
4 | 1414 4.472 -50000 52 2704 7.211 | 22.804 | .01923 
9 | 1.732 5.477 .33333 53 2809 7.280 | 23.022 | .01887 
16 | 2.000 6.325 -25000 54 2916 7.348 | 23.238 | .01852 
25 | 2.236 7.071 20000 55 3025 7.416 | 23.452 | .01818 


2.449 7.746 -16667 56 3136 7.483 | 23.664-| .01786 
49 | 2.646 8.367 .14286 57 3249 7.550 | 23.875 | .01754 
64 | 2.828 8.944 ‚12500 58 3364 7.616 | 24.083 | .01724 
81 | 3.000 9.487 11111 59 3481 7.681 | 24.290 | .01695 

100 | 3.162 | 10.000 -10000 60 3600 7.746 | 24.495 | .01667 


Downa naun | з 
co 
Ф 


1 121 | 3.317 | 10.488 -09091 .61 3721 7.810 | 24.698 | .01639 
12 144 | 3.464 | 10.954 .08333 62 3844 7.874 | 24.900 | .01613 
13 169 | 3.606 | 11.402 .07692 63 3969 7.937 | 25.100 | .01587 


14 196 | 3.742 | 11.832 :07143 64 4096 8.000 | 25.298 | .01562 
15 225 | 3.873 | 12.247 06667 65 4225 8.062 | 25.495 | .01538 


16 256 | 4.000 | 12.649 -06250 66 4356 8.124 | 25.690 | .01515 
17 289 | 4.123 | 13.038 .05882 67 4489 8.185 | 25.884 | .01493 
18 324 | 4.243 | 13.416 .05556 68 4624 8.246 | 26.077 | .01471 
19 361 | 4.359 | 13.784 .05263 69 4761 8.307 | 26.268 | .01449 
20 400 | 4.472 | 14.142 05000 70 4900 8.367 | 26.458 | .01429 


21 441 | 4.583 | 14.491 -04762 71 5041 8.426 | 26.646 | .01408 
22 484 | 4.690 | 14.832 04545 72 5184 8.485 | 26.833 | .01380 
23 529 | 4.796 | 15.166 .04348 73 5329 8.544 | 27.019 | .01370 
24 576 | 4.899 | 15.492 .04167 74 5476 8.602 | 27.203 | .01351 
25 625 | 5.000 | 15.811 .04000 75 5625 8.660 | 27.386 | .01333 


26 676 | 5.099 | 16.125 .03846 76 5776 8.718 | 27.568 | .01316 
27 729 | 5.196 | 16.432 .08704 77 5929 8.775 | 27.749 | .01299 
28 784 | 5.292 | 16.733 .03571 78 6084 8.832 | 27.928 | .01282 
29 841 | 5.385 | 17.029 .03448 79 6241 8.888 | 28.107 | .01266 
30 900 | 5.477 | 17.321 .03333 80 6400 8.944 | 28.284 | .01250 


31 961 | 5.568 | 17.607 .03226 81 6561 9.000 | 28.460 | .01235 
82 | 1024 | 5.657 | 17.889 .03125 82 6724 9.055 | 28.636 | .01220 
33 | 1089 | 5.745 | 18.166 03030. 83 6889 9.110 | 28.810 | .01205 
84 | 1156 | 5.831 | 18.439 .02941 84 7056 9.165 | 28.983 | .01190 
35 | 1225 | 5.916 | 18.708 .02857 85 7225 9.220 | 29.155 | .01176 


36 | 1296 | 6.000 | 18.974 .02778 86 7396 9.274 | 29.326 | .01163 
37 | 1369 | 6.083 | 19.235 .02703 87 7569 9.327 | 29.496 | .01149 
38 | 1444 | 6.164 | 19.494 .02632 88 7744 9.381 | 29.665 | .01136 
89 | 1521 | 6.245 | 19.748 02564 89 7921 9.434 | 29.833 | .01124 
40 | 1600 | 6.325 | 20.000 02500 90 8100 9.487 | 30.000 | .01111 


41 | 1681 | 6.403 | 20.248 .02439 91 8281 9.539 | 30.166 | .01099 
42 | 1764 | 6.481 | 20.494 .02381 92 8464 9.592 | 30.332 | .01087 
43 | 1849 | 6.557 | 20.736 .02326 93 8649 9.644 | 30.496 | .01075 
44 | 1936 | 6.633 | 20.976 .02273 94 8836 9.695 | 30.659 | .01064 
45 | 2025 | 6.708 | 21.213 02222 95 9025 9.747 | 30.822 | .01053 


46 | 2116 | 6.782 | 21.448 .02174 96 9216 9.798 | 30.984 | .01042 
47 | 2209 | 6.856 | 21.679 .02128 97 9409 9.849 | 31.145 | .01081 
48 | 2304 | 6.928 | 21.909 .02083 98 9604 9.899 | 31.305 | .01020 
49 | 2401 | 7.000 | 22.136 .02041 99 9801 9.950 | 31.464 | .01010 
50 | 2500 | 7.071 | 22.361 .02000 || 100 | 10000 | 10.000 | 31.623 | .01000 
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TABLE XXII Four-Place Logarithms 


0043 0086 0128 
9453 0492 0531 
0828 0864 0899 
1173 1206 1239 
1492 1523 1553 
1790 1818 1847 
2068 2095 2122 
2330 2355 2380 
2577 2601 2625 
2810 2833 2856 
3032 — 3054 3075 
3243 3263 3284 
3444 3464 3483 


3636 3655 3674 
3820 3838 3856 


0170 0212 0253 
0569 0607 0645 
0934 0969 1004 
1271 1303 1335 
1584 1614 1644 
1875 1903 1931 
2148 2175 2201 
2405 2430 2455 
2648 2672 2695. 
2878 2900 2923 
3096 3118 3139 
3304 3324 3345 
3502 3522 3541 


3692 3711 3729 
3874 3892 3909 
4048 4065 4082 
4216 4232 4249 
4378 4393 4409 


4533 4548 4564 
4683 4698 4713 
4829 4843 4857 
4969 4983 4997 
$105 5119 5132 
5237 5250 5263 
5366 5378 s3or | 
5490 5502 5514 | 
5611 5623 5635 
5729 5740 5752 


5843 5855 5866 


0204 0334 0374 
0682 0719 0755 
1038 1072 1106 


1367 "1399 1430 
1673 1703. 1732 
1959 — 1987 2014 


(2227 2253 2279 
2480 2504 2529 


2718 2742 2765 
2945 ^ 2967 2989 
3160 3181 . 3201 
3365 3385 3404 
3560 35799 3598 


3747 3766 3784 
3927 — 3945 3962 
4099 — 4116 4133 
4265 4281 4298 
4425 4440 4456 


4579 4594 4609 
4728 — 4742 4757 
4871 4886 4900 
5011 5024 5038 
5145 5159 5172 
5276 5289 5302 
5403 — 5416 5428 

5527 ^ 5539 ^ 5551. 
5647 5658 5670 
5763 5775 5786 


5877 5888 5809 


4487 4502 4518 
4639 4654 4669 

4786 4800 4814 
4914 | 4928 4942 4955 
32 | 5051 | 5065 5079 5092 
33 | 5185 | 5198 5211 5224 
34 | 5315 | 5328 — 5340 5353 
_35 | 5441 | 5453 5465 5478 
36 | 5563 | 5575 5587 5599 
37 | 5682 | 5694 5705 5717 


38 | 5798 | 5809 5821 5832 


39 | sott | 5922 5933 5944 5955 5966 5977 5988 5999 бото 
40 | 6021 | 6031 6042 6053 6064 6075 6085 609 6107 6117 


6149 6160 
6243 6253 6263 


43 | 6335 | 6345 6355 6365 | 6375 6385 6395 
44 | 6435 | 6444 6454 6464 6474 6484 6493 
45 | 6532 | 6542 6551 6561 6571 6580 6590 
6628 | 6637 6646 6656 6665 6675 6684 
6721 | 6730 6739 6749 6758 6767 6776 
6812 | 6821 6830 6839 6848 6857 6866 
6902 | 6911 — 6920 6928 6937 6946 6955 
50 | 6990 | 6998 — 7007 7016 7024 7033 7042 
7076 | 7084 7093 7101 7110 7118 7126 
7160 | 7168 7177 7185 7193 7202 7210 
7243 | 7251 7259 7267 7275 7284 7292 
7324 | 7332 7340 7348 7356 7364 7372 


6170 6180 6191 
6274 6284 6294 
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TABLE XXII Four-Place Logarithms (Continued) 


55 | 7404 | 7412 7419 7427 7435 7443 7451 
56 | 7482 | 7490 7497 7505 | 7513 7520 7528 
57 | 7559 | 7566 7574 7582 | 7589 7597 7604 
58 | 7634 | 7642 7649 7657 | 7664 7672 7679 
59 | 7709 | 7716 7723 — 7731 | 7738 7745 7752 
60 | 7782 | 7789 7796 7803 7810 7818 7825 
61 | 7853 | 7860 7868 7875 | 7882 7889 7896 
62 | 7924 | 7931 7938 7945 | 7952 7959 7966 
63 | 7993 | 8000 8007 8014 8021 8028 8035 
64 | 8062 | 8069 8075 8082 8о89 8096 8102 
765 | 8129 | 8136 8142 8149 | 8156 8162 8169 
66 | 8195 | 8202 8209 8215 | 8222 8228 8235 
67 | 8261 | 8267 8274 8280 8287 8293 8: 


8331 8338 8344 | 8351 8357 8363 
388 -| 8395 — 840r 8407 8414 8420 8426 
8457 8463 8470 8476 8482 8488 
8519 8525 8531 | 8537 8543 8549 
8579 8585 8591 8597 8603 8609 
8639 8645 865: | 8657 8663 8669 
8698 8704 8710 8716 8722 8727 
8756 8762 8768 | 8774 8779 8785 
76 | 8808 | 8814 8820 8825 8831 8837 8842 
77 | 8865 | 8871 8876 8882 8887 8893 8899 


7459 — 7466 7474 
7536 7543 7551 
7612 7619 7627 


7686 7694 7701 
7760 7767 7774 
78327839 7846 

7903 7910 7917 

7973 7980 7987 

8041 8048 8055 

8109 8116 8122 
8176 8182 8189 

8241 . 8248 8254 

8306 8312: 8319 


8370 8376 8382 
8432 8439 8445 
8494 8500 8506 
8555 8561 8567 
8615 8621 8627 
8675 8681 8686 
8733 — 87390 8745 
8791 8797 8802 
8848 8854 8859 
8904 8910 8915 


78 | 8921 | 8927 8932 8938 | 8943 8949 8954 | 8960 8965 8971 
_79 | 8976 | 8982 — 8987 8993 | 8998 9004 9009 | 9015 9020 9025 


9069 — 9074 9079 
9122 9128 9133 
9175 9180 9186 
9227 9232 9238 

9279 9284 9289 
| 9330 9335 9340 | 
9380 9385 9390 
9430 9435 9440 
9479 9484 9489 
9528 — 9533 9538 

9576 — 9581 9586 
9624 9628 9633 
9671 9675  9680- 


9717 9722 9727 


80 | 9031 | 9036 9042 9047 | 9053 9058 9063 
81 | 9085 | 9090 9096 910r 9106 9112 9117 
82 | 9138 | 9143 9149 9154 | 9159 9165 9170 
83 | 9191 | 9196 9201 9206 9212 9217 9222 
84 | 9243 | 9248 9253 9258 | 9263 9269 9274 
85 | 9294 | 9209 9304 9309 | 9315 9320 9325 
86 | 9345 | 9350 9355 9360 | 9365 9370 9375 
87 | 9395 | 9400 9405 9410 | 9415 9420 9425 
88 | 9445 | 9450 9455 9460 | 9465 9469 9474 
89 | 9494 | 9499 9504 9500 | 9513 9518 9523 
90 | 9542 | 9547 — 9552 9557 | 9562 9566 9571 


91 | 9590 | 9595 обоо 9605 | 9609 9614 9619 
92 | 9638 | 9643 9647 9652 | 9657 9661 9666 


93 | 9685 | 9689 9694 9699 | 9703 9708 9713 

294 | 9731 | 9736 9741 9745 | 9750 9754 9759 | 9763 — 9768 — 9773. 

96 | 9777 | 9782 9786 9791 | 9795 9800 9805 | 9809 o oe 

96 | 9823 | 9827 9832 9836 | 9841 9845 9850 | 9854 9859 9803 

97 | 9868 | 9872 9877 988: | 9886 9890 9894 | 9899 9903 9908 
8 9952 

98 | 9912 | 9917 9921 9926 | 9930 9934 9939 | 9943 994 

99 | 9956 | 9961 9965 9969 | 9974 9978 9983 | 9987 9991 9996 
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TABLE XXIII Random Numbers 


(2) (13) 4) 
99570 91291 90700 


a) 


ү 
: iit dii 
IIT 
tH pn 


(2) (3) @ (6 m 
10480 15011 01536 02011 
25595 
22527 


(1) 


| 


19174 39615 
74917 97758 16379 


20969 
52666 
30680 19655 
00849 


91646 69179 
89198 
64809 
16376 


81647 
30995 
76393 


manwa ereoc 


63348 


01263 54613 
44304 42880 


06927 


110 
5053 21916 81825 


41 


i НИЕ 
i REN HB 
di ВНЕ 
2 


223285 


484 


22851 
18510 


32812 
44592 


1 40719 
55157 


05944 
96909 


18296 
07684 36188 
86679 50720 94953 


51132 01915 92747 64951 
94738 17752 35156 35749 
88916 19509 25625 , 58104 
30421 61666 99904 

21524 15227 


06912 17012 64161 


64835 44919 


900 70960 63990 7560 
41135 10367 
32586 


58151 
35806 
46557 
61280 50001 67658 


85030 


64350 
46104 


1259 77452 16308 60756 92144 49442 53 
06691 76988 13602 51851 
09172 30168 90229 04734 50193 22178 
25306 26384 06646 
28728 


74103 47070 


52404 
33362 
91245 85828 14346 : 


51085 12765 51821 5 
54164 58402 22421 
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S228 RARAS 


random digits prepared by the Bureau of Transport Economics and Statistics of the Interstate Commerce 


Taken from the 30-page table of 105,000 
Commission, Washington, D.C., Mr. W. Н. 8. Stevens, Director. It is used in this text with their permission, 


ae (11) (12) a3) (14) 


(9) 


I gu ИШ 
08888 BER АЕ 
ТАЕ 
ОНЫ HA UR 
cided GHEE ТЕНЕ 
HET 8B HB RH 
EE БЕРШШ 
HE BAIE HS 
И ШШШ 
5855 HE BH Hi 
INE Н Е 
HS IB ТЕН 
ПЕЕ ЕТИН 
HE 28088 PREND 22808 


SRRRS #8888 85989 59913 


485 


38857 50490 


S58 BN NP 
SS SEE HH 
53858 ЕЕЕ BENI 
ИРИ 
СООО GG HG 
EE GEE: 


92351 97473 


35648 
54328 
59516 81652 
92350 36693 3 


71013 18735 


41035 80780 


45799 22716 19792 09983 


83765 
97662 24822 94730 
20801 


21999 
79401 21438 83035 
16815 69298 
24360 54224 
00697 35552 
64486 64758 75366 
02584 


i 
iE EI 
HI BIS Hi 
HH. 
ШИ: 22567 E 
СЕ 


55998 332333 35588 


TABLE XXIV Pre-registration Scores of 447 College Students on the Cooper- 


ative Test Service English Test (Data supplied by Dr. Irving Lorge) 


Subject Score | Subject Score | Subject Score Subject Score | Subject Score 
1 141 41 133 81 102 121 156 161 093 
2 120 42 146 82 138 122 098 162 094 
3 150 43 138 83 132 | 123 085 | 163 119 
4 156 44 138 84 118 | 124 115 | 164 139 
5 122 45 128 85 135 | 125 129 | 165 166 
6 128 46 127 86 100 | 126 143 | 166 108 
7 112 47 142 87 111 127 108 167 108 
8 178 48 178 88 169 | 128 087 | 168 092 
9 120 49 109 89 124 | 129 121 | 169 175 
19 160 50 131 90 134 130 187 170 160 
11 104 51 157 91 135 | 131 179 | 171 152 
12 088 52 11 92 180 132 140 172 140 
13 100 53 114 93 102 | 133 186 | 173 177 
14 191 54 117 94 117 | 134 086 | 174 133 
15 137 55 115 95 085 135 175 175 11 
16 108 56 155 96 169 | 136 120 | 176 114 
17 147 57 190 97 127 137 175 177 106 
18 127 58 159 98 128 | 138 133 | 178 147 
19 156 59 141 99 131 | 139 107 | 179 090 
20 201 60 100 | 100 167 | 140 119 | 180 167 
21 131 61 - 125 | 101 118 | 141 185 | 181 156 
22 174 62 116 102 142 142 100 182 130 
23 096 63 148 | 103 143 | 143 102 | 183 143 
24 140 64 169 | 104 094 | 144 138 | 184 142 
25 102 65 189 | 105 117 | 145 129 | 185 151 
26 090 66 145 | 106 114 | 146 184 | 186 168 
27 177 67 084 | 107 091 | 147 145 | 187 134 
28 125 68 180 | 108 091 | 148 148 | 188 120 
29 164 69 139 | 109 120 | 149 155 | 189 124 
30 150 70 166 | 110 119 | 150 124 | 190 176 
31 181 71 159 | 111 185 | 151 109 | 191 170 
32 118 72 126 | 112 158 | 152 103 | 192 120. 
33 192 78 177 | 113 138 | 153 113 | 198 165 
34 169 74 165 | 114 128 | 154 135 | 194 102 
35 117 75 174 | 115 199 | 155° 117 | 195 109 
36 152 76 173 | 116 111 | 156 123 |. 196 116 
37 176 77 149 | 117 105 | 157 101 | 197 145 
38 089 78 173 | 118 100 | 158 107 | 198 136 
39 151 79 093 | 119 158 | 159 108 | 199 099 
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Subject Score | Subject Score | Subject Score | Subject Score Subject Score 
201 108 | 251 098 | 301 130 | 351 095 | 401 068 
202 144 | 252 104 | 302 140 | 352 185 | 402 078 
203 117 | 253 128 | 303  113| 353 095 | 403 063 
204 166 | 254 131 | 304 189 | 354 160 | 404 077 
205 115 255 099 305 152 355 133 405 032 
206 089 | 256 178 | 306 105 | 356 120 | 406 058 
207 128 257 119 307 11 357 149 407 063 
208 165 | 258 114 | 308 109 | 358 137 | 408 070 
209 176 | 259 090 | 309 139 | 359 187 | 409 058 
210 148 | 260 129 | 310 110 | 360 115 | 410 053 
211 127 | 261 148 | 311 156 | 361 087 | 411 080 
212 143 262 096 312 156 362 097 412 072 
213 143 263 108 313 103 363 124 413 052 
214 183 264 162 314 165 364 128 414 078 
215 151 265 121 315 121 365 158 415 069 
216 144 266 154 316 105 366 157 416 083 
217 139 | 267 178 | 317 153 | 367 149 | 417 066 
218 146 | 268 168 | 318 132 | 368 137 | 418 055 
219 151 | 269 089 | 319 114 | 369 079 | 419 036 
220 086 | 270 147 | 320 184 | 370 068 | 420 051 
221 136 | 271 154 | 321 118 | 371 058 | 421 070 
222 148 272 087 322 105 372 081 422 080 
223 107 | 273 177 | 828 186 | 373 045 | 423 072 
224 132 | 274 154 | 324 141 | 374 072 | 424 073 
225 180 | 275 140 | 325 173 | 375 060 | 425 076 
226 118 | 276 093 | 326 124 | 376 050 | 426 083 
227 106 | 277 148 | 327 140 | 377 075 | 427 080 
228 086 | 278 088 | 328 103 | 378 044 | 428 082 
229 143 | 279 157 | 329 165 | 379 076 | 429 081 
230 158 | 280 136 | 330 172 | 380 039 | 430 057 
231 122 281 094 331 185 381 058 431 078 
232 096 | 282 155 | 332 094 | 382 071 | 432 052 
233 143 | 283 163 | 333 094 | 383 029 | 433 063 
234 151 | 284 123 | 334 091 | 384 072 | 434 065 
235 163 | 285 110 | 335 164 | 385 063 | 435 052 
236 132 286 125 336 172 386 080 436 081 
237 168 | 287 i47 | 337 174 | 387 083 | 437 056 
238 125 | 288 092 | 338 136 | 388 050 | 438 081 
239 164 289 091 339 164 389 070 439 058 
240 162 | 290 169 | 340 101 | 390 082 | 440 083 
241 106 291 180 341 131 391 078 441 072 
242 085 | 292 149 | 342 129 | 392 058 | 442 016 
243 159 | 293 115 | 343 144 | 393 080 | 443 069 
244 136 294 155 344 178 394 069 444 076 
245 152 | 295 142 | 345 088 | 395 040 | 445 063 
246 144 296 157 346 183 396 046 446 028 
247 122 | 297 104 | 347 146 | 397 067 | 447 082 
248 185 | 298 086 | 348 099 | 398 047 
249 159 299 109 349 116 399 047 
250 136 300 132 350 108 400 065 
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Glossary of Symbols 


The letters X and Y (and Z when a third letter is needed) are here customarily 
used to denote random variables: and observations on individuals. The other 
English letters, capital or small, are used as convenience may require to denote 
constants in an equation, or to name groups or categories within groups. Letters 
at the first of the alphabet are used most often for this purpose. Such casual uses 
to which one letter might be put as well as another, will not be included in this 
glossary. Small letters from the middle of the alphabet, especially i and j but 
also A, k, and J, are used as variable subscripts. (See pages 112 and 197.) 

Parameters are customarily denoted by Greek letters and sample statistics 
by English. However, certain exceptions have been made, as in the use of x 
to denote a statistic and of F, P, and Q to denote the frequency and proportion 
in a population. 

The use of multiple subscripts has been treated on page 196. The use of sub- 
scripts in multiple and partial correlation and regression is discussed in Chapter 13. 

A bar over any letter indicates the sample mean of the variable denoted by 
the letter. 

Greek letters and English letters are listed separately according to their re- 
spective alphabets. Symbols of operation are listed separately. 


a (1) Constant term in regression equation; (2) observed frequency 
in one class of double dichotomy. 
а (1) Frequency in ith row and first column of a sample in which 


horizontal trait is dichotomous, or similarly for reversal of rows and 
columns (see page 99); (2) constant term in the regression equation 
for the ith group. 


A (1) Constant term in regression equation; (2) Za;; (3) alternative 
to a stated hypothesis; (4) arbitrary origin. 

b Frequency observed in one class of a double dichotomy. 

b; (1) Frequency in ith row and second column of a sample in which 


horizontal trait is dichotomous (see page 99); (2) regression coeffi- 
cient for the ith group. 


bye : Regression coefficient in the sample equation to estimate y from т, 
or Y from X (similarly bzy, bys, etc.). Е 

5% Analogous to Б, except that each variable is expressed as a multiple 
of its standard deviation, or by282/By. 

bis Coefficient for the term involving 2» (or X) in the sample regression 


equation to estimate z, (or X3) from т» zs, and z, (or X», Xs, and 
X4). (See Chapter 13 for use of other subscripts.) 


binu Analogous to bizs except that each variable is expressed as а mul- 
tiple of its standard deviation, or bi».3482/s). 
B (1) Zb;; (2) a term in Bartlett’s test for homogeneity of variance. 


B' Approximation to B in Bartlett's test. 


Cur 
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(1) Observed frequency in one class of a double dichotomy; (2) the 
number of columns in a layout of rows and columns. 

(1) Coefficient of contingency; (2) a class or category, usually with 
subscripts to indicate which class; (3) the mean square among 
columns as in Chapter 13; (4) sum о? squares or sum of products 
as used in Chapter 15; (5) elements of the inverse matrix in multiple 
regression. 

The sum of the squares of deviations from the mean, or Х(Х — Х,)? 
for the ith group or category, and similarly for C, (See Chapter 15.) 

k 


Sum of squares between groups, or Y NX; — X)’, and similarly 
=1 
for Cys. 


k Ni 
Sum of squares within groups, or Y > (Xia — Х;)°; and similarly 


іла 
for Cyyw. (See Chapter 15.) 
The sum of products 2(Xia — X,)(Yia — Y;) for the ith group or 
category. (See Chapter 15.) 


k 
Sum of cross products for means, or Y NX; – Х)(Ү, – Y). 
=1 


(See Chapter 15.) 
k Ni 


Sum of cross products within groups, or Y y (Xia — X)(Ya — Y). 


(See Chapter 15.) e 

Sum of squares for total group, e) x (Xia — X)? and similarly 
for Cyyr. (See Chapter 15.) = 

Sum of cross products for total group, or y ў (Xia — X)(Yia — Y). 


t=la=1 


(See Chapter 15.) 


C(A<p<B)=1-—a _ A statement made with confidence coefficient 1 — a 


that u lies between A and B, and similarly for statements about 
other parameters. 

(1) An error of measurement, as in Chapter 12; (2) ua — ин as 
on page 163; (3) X; — X as on/page 128; (4) observed frequency 
in one class of a double dichotomy. In some texts d indicates the 
deviation of a score from its mean but it is not so used here. 
Degrees of freedom. See also n. 

(1) Weighted difference between means as on page 399; (2) difference 
between two scores of same individual as on page 152; (3) maximum 
vertical difference between two cumulative frequency polygons in 
K-S test. 

(1) A mathematical constant approximately equal to 2.718 occurring 
frequently in equations of probability distributions; (2) an urror of 
measurement as used in Chapter 12. 

Mean square for error as used in Chapter 14. 
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Ezy, Eya Correlation ratio in a sample. (In some texts the symbols 7,, and 
"= are used, but here the latter denote population values of the 
correlation ratio.) 


E( ) The expectation of, or expected value of whatever variable is denoted 
within the parenthesis, as E(X), E(X — u), E(p), ete. (see page 27). 

Ў Frequency іп а sample. 

Е (1) Frequency in a population ; (2) variance ratio in a sample, 


ordinarily used with a subscript to denote the probability under 
some hypothesis of obtaining a smaller value by chance, 


Fs The 95th percentile of the variance ratio distribution, and similarly 
for other subscripts. 

Fux The ratio of the largest to the smallest variance (see page 192). 

H Test statistic for comparing samples by means of the sums of ranks. 


H:u = 4 The hypothesis that the mean of a population is equal to A, and 
similarly for other hypotheses. 
A variable subscript. 

i (1) A variable subscript (see pages 112 and 197); (2) the width of 
a class interval. 


j A variable subscript. 

k (1) The number of groups, classes, variables, strata, and the like; 
(2) a variable subscript. 

ГА Weight given ith variable, used оп page 356. 

mi Weight given ith variable, used on page 356. 

m (1) The number of terms accumulated in one tail of the binomial dis- 


tribution as used in Table TV; (2) the number of items in a test; 
(3) the number of clusters in a cluster sample, as used on page 176. 


M (1) The number of cases in a finite population (see page 167); (2) the 
number of clusters in a finite population (see page 176). 

n (1) The number of degrees of freedom; (2) the number of cases in 
a subclass (occasional use). 

т The number of degrees of freedom for the numerator mean square of 
а variance ratio, 

т The number of degrees of freedom for the denominator mean square 
of a variance ratio, [ 

N The number of cases in a sample, sometimes used with a subseript 
to indicate a subsample. 

N! Factorial N (see page 18). 

(3) The number of combinations of N things taken r at a time. 

p The proportion of elements in one class of a dichotomous sample. 

Di The proportion of elements in ith class of a sample. 

В The proportion of elements in one class of a dichotomous population. 


P; 

P(X is in С) Probability that X is in the ith class. (See page 13 for variants.) 

Р(Х. is in C;| Xy is in С) Probability that X, will fall in the jth class if X, 
falls in the ith class. (See page 14 for variants of this.) 

P(A « z « B) Probability that z will fall between the fixed values A and B. 

9 1-р 


Tz.yz, 11.23 
ту.) "12.8 
Tzs 


pzz 


рагу 

R 

К; 
Ёнг Ёш 


Ёк у, Tecos 
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Ys 
(1) deme of correlation; (2) the number of rows in a layout 
of rows and columns. 

Biserial correlation. 

Point biserial correlation. 

"Tetrachorie correlation. 

Coefficient of multiple correlation. (See Chapter 13 for use of sub- 
scripts. See Rzy: below). 

Coefficient of partial correlation (see Chapter 13 for use of subscripts). 

Coefficient of reliability. 

Spearman-Brown coefficient of reliability when test length is in- 
creased p times. 

Estimate of correlation between two tests when one is lengthened p 
times and the other q times. 

(1) Rank order correlation; (2) row mean square in analysis of 
variance. 

Sum of ranks for group 7. 

The number of correct responses to the ith item among the upper 
and lower groups, when these groups have been selected on the basis 
of total test score. 

Coefficient of multiple correlation. (See Chapter 13 for use of sub- 
scripts. Also written тшу, 1,23.) 

(1) Standard deviation of a sample, used often with a subscript to 
name the variable; (2) standard error of a statistic as estimated 
from sample values, used with a subscript to name the statistic, 
88 8x, Sp, etc. (see o). 

The standard error of estimate for a sample De у). 

Variance, or square of the standard deviation, with meanings anal- 
ogous to those stated above for s. 

Variance of y-residuals from regression line to estimate y from z, 
and similarly for other subscripts. (See Chapter 10 for formula 
which differs slightly from formula in some other texts.) 

Sum of squares used in analysis of variance and covariance. For 
meaning of Sw, Ss, Se Si, S2, Ss and Sa, see Chapter 15. 

(1) Ratio of a variable with unit normal distribution to the sample 
estimate of the standard error of that variable; (2) the number of 
observations in a tied set, as in Chapter 18. 

(t — 1)t(t + 1) where t is the number of observations in a tied set. 
The total of all scores'in group û or Xia. 

The total sum of deviations of coded scores in group 7 from an arbi- 
trary origin, or Т; = ZN jr" ia. 

(1) Ordinate of unit normal curve; (2) statistic into which the vari- 
ance ratio F is transformed, as in Chapter 17. 

The number of runs in a sample (see Chapter 18). 

Width of a confidence interval. 

Coefficient of concordance. 

Deviation of a score from its sample mean, z = X — X. (Similarly 
for other letters.) 
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Deviation of a score from an arbitrary origin, z' = X — А. 
A gross score. (Similarly for other letters.) 

Mean of a sample. (Similarly for other letters.) 

Sample estimate computed from a regression equation. 
Median of a sample. 

25th percentile of a sample. 


„ Test score of an individual in the upper or lower group. 


Analogous to z, 2’, X, Z, X 
Ordinate of population distribution on a continuous variable, 
A variable with unit normal distribution, or an abscissa of the 
normal probability curve. (In other texts this letter is sometimes 
used for ‘an ordinate of the normal probability curve but is not во 
used here.) 
Transformation used in testing significance of a correlation coefficient. 
Transformation used in testing significance of a biserial coefficient 
of correlation. 
(1) Level of significance; risk of Type I error; (2) subscript attached 
to a statistic (as Za, la, Ха-а) to indicate probability of obtaining 
a smaller value of the statistic by chance; (3) subscript attached to 
X, z, Y, or y to indicate that the observation is for the ath individual. 
(1) Probability of accepting a hypothesis when an alternative is 
true, risk of Type II error; (2) population regression coefficient. 
(For subscripts used with 8 see Chapters 10 and 13.) 
Population regression coefficient when variables are expressed as 
multiples of their standard deviations, (Subscripts same as for 8.) 
Population value corresponding to the statistic #;. 
Population value corresponding to z*. 
Correlation ratio in the population. 
Population mean. (Subscript may be used to indicate the variable 
or statistic of which it is the mean.) 
Median of a population (see Chapter 16). 
Percentile of a population, the percent of cases with lower scores 
being indicated by the subscript (see Chapter 16). 
8.1416, or the ratio of the circumference of a circle to its diameter. 
Correlation coefficient in a population. (In some other sources it 
is used to denote a coefficient of rank order correlation but is not so 
used here.) 
Reliability coefficient in a population. 
Coefficient of partial correlation in a population. (For other sub- 
scripts, see Chapter 13.) 
Coefficient of multiple correlation in a population, correlation of 21 
with regression estimate of x, made from best linear combination 
of zs z; and д. (For other subscripts, see Chapter 13.) 
(1) Standard deviation of a population, often used with a subscript 
to name the variable as о,, ay; (2) standard error of a statistic, used 
with a subscript to name the statistic, ав оў, dp, on etc.; (3) standard 
error of estimate for a population = standard deviation of residuals 
from a regression equation, used’ with subscripte as 01.34, in which 
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the primary symbol before the period names the variable estimated 
from the regression equation and the secondary subscripts following 
the period name the independent variables employed in the equation, 
Variance of a population, with subscripts as for о. 

The sum of. 


Xa + Хаа +++ + Xp (see pages 111 and 197). 


(1) A measure of relation based on order, discussed in Chapter 11; 
(2) the “true score” of an individual on X, discussed in Chapter 12, 
The “true score" of an individual on Y, discussed in Chapter 12, 
(1) A measure of relationship between two dichotomous variables, 
discussed in Chapter 11; (2) a transformation of percents to angles, 
described in Chapter 16. 

Chi-square. 

The fifth percentile of the x? distribution. (In other sources this 
symbol may mean the 95th percentile.) ý 

Chi-square computed from ranks, 

Chi-square after application of Yates’ correction, 


X is greater than Y, or Y is less than X. 

X is greater than or equal to Y; X is not less than Y; У is leas than 
or equal to X; or Y is not greater than X. 

Is equal to. 

Is not equal to. 

Is approximately equal to. 

"The absolute or numerical value of a, the sign being taken as positive. 
The positive square root of a. 

The positive square root of a. 


Answers to Problems 


Exercise 2.1, page 17 
1. b. P(X = 1) = P(X =2) =... = P(X = 6) 
€ P(X = lor X = 2) = P(X = 1) + P(X = 2) = $ 
d. P(X = 1) + P(X = 3) + P(X = 5) = +i timg 
f. P(X #0) = 1 — P(X 0) е1 фф 
2. а. The probability that 3 will appear ọn both throws. 
b. The probability that the sum of the numbers appearing on the two throws 
will be 2, 


[3 The probability that the sum of the numbers appearing on the two throws 
will be 3. 


& The probability that both numbers will be smaller than 6. 
h. The probability that the sum of the two numbers will be 11. 


Exercise 2.2, page 19 
2. а. 120 »b, 210 Gm 210 e6 £16 g 35 h 3 


Exercise 2.3, page 26 
1. (7 + 3) = 240 + 412 + 2265 + .076 + 008 = 1.001 
8. а. (5+ 5)' = 125 + 375 + 375 + 125 = 1,000 
b. (5 + 5) = O16 + 004 + 234 + 312 + 234 + 004 + 016 = 1.000 
© (5+ д)" = 001 + 010 + 044 + .117 + 205 + 246 + 205 + 117 
+ 044 + 010 + 001 = 1.000 
& a. (4 + .6)* 064 + 288 + 432 + 216 = 1000 
b. (4 + 8) = 004 + 037 + .138 + 276 + 311 + 187 + 047 
& (4+ д" = 000 + 002+ O11 + 042 + 111 + 301 + 251 + 215 
+> 121 + 040 + 0% 
& a. (8+ 2) = 512 + 394 + 006 + 008 
b. (8+ 2) = 202+ 303 + 246 + 082 + 015 +002 + 000 
© (8+ 2) = 107 + 208 + 202 + 201 + .088 +026 + 006 + 001 
+ 000 + боо 
Exercise 2.5, page 30 
1. а. E(Np) = ХР = 10(.5) = 5 b. No; No. c. Yes E(Np) = 20(.5) = 10 
4. oh, = NPQ = 10,5) = 25; 500 observations are actually a sample; 1000 
provide better approximation than 500. 
е. Yes, of, = 20055 50 
2. с. Different only because of sampling error, 
4. No; 800 observations are subject to sampling error. 
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Exercise 2 6, page 36 


3. а. 12 b. с. 99 4. 05 e 90 
t. 98 g 05 һ. 22 L0 j 55 
к. 10 L 00 m. 005 n. 002 о. 06 

4. а. — 1.037 b. — 1751 c 485 4. A745 е. 1.046 
f. 1.900 g 2.576 h. ~ 2.576 L 1.96 ) sns 


Exercise 3.1, page 51 
1. If p = А, accept all stated hypotheses except Р = $9, 
If p = .5, accept all stated hypotheses except Р = 10 and Р = 94, 
If p = .2, reject all stated hypotheses except Р = 494 and Р = 40. 
2. If p = A, range is 12 < P < 75. 
If р = „5, range is 18 < P < 82. 
If p = .2, range is 02 < P < 55, 


Exercise 3.2, page 57 

5. а. 41 <Р < 82 b. 50 <Р < 74 e 50 <P < 1 
d, I< P< 45 e 14 <Р.< 40 t 22 <Р < 28 

6. As № increases, width of interval decreases. 

1. 05 <P < .17 8. 05 < P > 83 


Exercise 3.3, page 59 
4a X S3and X 11 b. X$i4andX z M e X£Z7and X z 18 


Exercise 3.4, poge 63 


3. а. HJ b. HK & AB 4. AC е. JM 
f. KM g BD b. CD 1 BD J. ср 
4. а= 02; a = 11 6 P= 50 ба» 1 
1. a = ordinate at point Н; f = 1 — onlinate at alternative to Н. 
8. а = 172. Bee A of Figure 3-5. 9. а = 172. Bee C of Figure 3-5. 
Exercise 3.5, page 66 
8. 17; A1; 17 4. more satisfactory, 
5. а. about 05 b. about 2 €. about 4 
d. about .05 е. about 8 1. about 6 
6A; С т.с; 4 
Risk of error. 
10. Situation Decision a в 
І R n 0 
2 A 0 x 
3 R A 0 
4 A 0 x 
5 A 0 x 
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' Exercise 3.6, page 76 


1. .36 <Р < 48 
2. Yes; р = 48; z = — 3.46; za = — 2.05; Reject H : P > 60 
3. N = 721 4 N = 63 5. № = 8131 


Review Exercise, Page 79 
3. RII; AI; RII; AI; AI 


Exercise 4.1, page 92 
1. a. The probability of obtaining a random sample which will yield a value of 
x’ less than 9.9 in a situation in which there are 21 degrees of freedom is .02, 
h. In x* problems involving 25 degrees of freedom, 90% of random samples 
may be expected to yield values of x? between 14.6 and 37.7. 
m. The probability that a random sample will show a value of x? larger than 42 
if the number of degrees of freedom is 22 cannot be read exactly from Table 
VIII but that probability is between .005 and 01; there is probability be- 


is 22. 
2. and 3. xis Х.ю 
Der. elder A 18.5 A 
c КАДЕР 18.5 A 
б: Mi 85 18.5 R 
e 250 A 30.6 A 
f. 377 В 443 R 
E 438 A 50.9 A 
h. 263 В 32.0 A 
i 10.7 Р 24.7 A 
Jj: 95 R 13.3 A 


Exercise 4.2, page 94 

1.a. n =3, N = 10 b.n =3, N = 50 

2. Better fit in (b) because N is larger. 

3. Number of heads: 0 1 2 3 
Expected frequency: 4 16 94 16 
Observed frequency: 2 LE 28 20 
X = 5.29; х = 9.5; No 4. x? = 10.58 

b. x? = 529r. Multiplying observed and theoretic frequencies by r makes x? 
r times as large, 


Uomo 


Exercise 5.1, page 113 


9 8 8 8 8 
2. a. ух; b. yA = 4x с. У зь; = 3 Уз 
i=6 


i=6 i-5 =5 i=6 


8 8 8 
d. puc —4)- Y Y, — 44 е. Ўлх, 


=5 =1 
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Exercise 5.1, page 113 cont. 


3. a. 60 b. 5 c. 0 d. 26 e. 12 
f. 42 g. 625 h. 5 i 125 j +26 
4. а. 2 Ъ. 6 с. 424 4. 129 е. 9 
f, 64 g. 64 h. 26 i 26 


Exercise 5.2, page 117 
2. X —.813, s = 5.40 1 
з. а. X = 11, # = 355 b. X =15,8=299 co X= 2.57, 8 = 6.95 


Review Exercise, page 123 
ame b шош &X,s x, X-u dl 
с, does not fall into any of the classes named, for its value depends upon sample 
size as well as population P. 


Exercise 7.1, page 144 
1. а. 2.5 Ъ. 1 с. .5 d. .05 
е. 4 f. 8 в. 2 h. 4 


2. 044; .913; .0001; .9998 


Exercise 7.2, page 146 
4 a. tos = — 2.35 and tos = 2.35 when n = 3 
b. tos = — 1.75 and t.s = 1.75 when n = 16 
с. tos = — 1.645 and £s = 1.645 when n = 500 
б. to = — 3.08 and t.» = 3.08 when n = 1 
tio = — 1.37 and t.» = 1.37 when n = 10 
tao = — 1.282 and Ё = 1.282 when n = 400 
6. а. .20 < P < 40 when n =1 b. .05 <Р < .10 when n = 3 
с. 001 <Р < .01 when n = 500 


Exercise 7.3, page 151 
3. 95%; all 4 or 100% 
90%; all 4 or 100% 
50%; 3 or 75% 
10%; 1 or 25% 
4. C(25.6 < и < 28.8) = .99; C(260 < р < 28.4) = .95 


Exercise 7.4, page 160 
1.4 = 2.22. Ifa = 05, reject H :m = m. I a = 01, accept Н. 
2. D = 148; sp = .754; t = 1.96; tors = 2.02 

If a = .05, accept H :ш = p 


2.2 
. F ip, t = سک‎ = 12 
3 FO qst ver: 
1.95 


t = —— = 2,53 

For balance, CU 

Exercise 7.5, page 169 
1. a. .158 b. 475 с. .053 
2. a. .224 b. 5 c. .167 
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Exercise 7.5, page 169 cont. 
3. N = 750 
4. a. t = 4.4/3.92 = 1.12; tss = 3.25; accept H 
b. t = 4.4/4.1 = 1.07; accept H 
с. п = 9.54 — 2 = 7.50r 7; accept Н 
7. а. t = —3.2/1.886 = —2.31; Los = —2.20; reject Н 
Ъ. Accept Н without computation as Х < 35 
с. t = — 2,308; tos = — 1.80; reject H 
d. 28.75 < u < 34.85 
e. 27.49 < џ < 36.11 


مگ تفت د 


8. a. (7.10), normal b. (7.10), n = 5 [ 
c. (7.21), normal d. (7.21), n = 31 і 
е. (7.24), normal f. (7.24), t, n = 11.6 

9. a. (6) b. (1) с. (1) or (2) 4. (6) 

10. (1) N = 292 (2) Ni = №, = 420 (8) Ny = 346, Ns = 577 
(4) Ni = №, = 47 (5) Ni = 44, N: = 87 


Exercise 8.1, page 183 


8. 90%; 3 or 75% 
80%; 3 or 75% 
60%; 2 or 50% 
40%; 1 or 25% 

9. x? = 6800/237.16 = 28.67 > xta 
10. 4.53 < o < 7.42 


Exercise 8.2, page 187 


F Fs PF» Probability 
2. a. 1.38 1.74 2.20 P>.10 
b. 2.88 1.95 2.56 P < .02 
с. 1.76 2.18 3.09 Р> 10 
d. 4.78 2.90 4.73 Р < .02 
е. 2.19 1.72 2.19 Р = 02 


Ехегсіѕе 9.1, раде 198 
See Table 9.1, page 199 


Exercise 10.1, page 242 
Data required for 1 and 2: 


F = 50.83; (№ — 1)s = 2869.8; з, = 70.00; 8, = 8.367 


Х = 47.10; (№ — 1)s = 5078.6; 5-193285; з, 
Sys = 45.75; 8y.2 = 6.764; bye = 453 Туг 
1. а. By (10.27), ¢ = (453 = Ф(11зу/а1 _ qy 
6.764 
.602У40 
By (10.28), = —Ó902V 0... 7 
| ' ^ v1- (602° f 


tos = 2.70 Reject H :8,, = 0 
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Exercise 10. 1, page 242 cont. 


b. By (10.27), t = (459 — 60118 V4 _ _ | 55 


6.764 
tims = 2.02 Accept H : Byz = 6 
_ (50.83 — 52)У42 
c. By (10.25), t PITE 1.11 


Accept H :u, = 52 

. By (10.32), 261 < By. < .644 
. By (10.31), 48.72 < и, < 52.94 
. By (10.33), 30.9 < o%,.2 < 75.0 
. By (10.34), 45.5 < шу. < 49.8 
3. X = 24.2; Y = 69.0; by = 217 

Ў = 63.8 + 217X; s= = 541 

— .125 < В, < .559 

64.6 < py < 73.4 

33.3 < съ. < 103.2 

Accept H :В = 0 and H : fys 5 0; Reject H : 8, = 5 


согур 


Ехегсїзе 10.2, раде 252 


= 06 N-5 N=2 N=8 N=102 
ТМО er ioe 204 538 801 
vi- 
Be 120 300 540 .603 
r= .20 
KNEE з 1.00 1.83 2.04 
V1 = т 
NES А00 100 1.80 201 
т = .60 
tVN=2 ig 3.67 671 7.50 
vi-t 
"NT 1.20 3.00 540 . 6.03 


Highly satisfactory agreement for N large and r near zero. 


Exercise 10.3, page 257 
1. t = 133 2. t = 2.03 


Exercise 10.4, page 258 
1. a. X = 31.38; Y = 49.15; by, = .652, з, = 7.25 
з, = 9.42 Try = .502; Зу = 8.19 
b. Ў = .652Х + 28.69 с. 75% 


1,20 


4.08 
4.00 


14.98 


12.00 


d. t= 57 
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Exercise 10.4, page 258 cont. 
e. If X = 62, = 69.11; ИХ = 38, Y = 5347; it X = 54, Y = 6390 
f. (1) 53.17 < р, < 58.97 (2) 425 < By. < .870 
(3) .336 < p < .635 (4) 49.7 < 0°, < 86.8 


2. a. Ў = 20.16 + .812Х b. X = 20.60 + 450Y с. 430 < p < .737 
d. r = 6085; 2, = .707 
р = 8000 { = .693 
t = (.707 — 693) V119 = 1.53 
Accept H : p = .60 
е. 1 = — 146 


3. a. bysbay = (.96)(1.42) > 1.00 
b. Because Zzy > УХ: r > 1.00 
€. T, bys, and bay must all have same sign. 
d. r should be equal to bysbey, but .36 з (.8)(.9). 
е. 5,., cannot be larger than sy, 
f. Same difficulty as in c. 
* & bye = — .9 and by = — .6. Same difficulties as in c and 4. 
h. bys = тв, /8, = (:5)(10)/4 = 1.25, not .25. . 
i. bye should be (.5)(3)/6 = .25 instead of 1.00. 
bzy should be (.5)(6)/3 = 1.00 instead of 25. 


24 — £5 68 


Von? 


4 t= 


Exercise 13.1, page 326 
1. Ў = 22.68 + .214X, + АТТХ, 


2. Кула = .642 
ШЕ ҮТЕ 
З.Р Т=Ё 2 33.3 


Фа P= Aym + byraX: + bae + by Xs 
b. Веља + rab yon + Tab" 1.20 = rya 
те + byen + Tad v1.26 = Ty 
Tab ys + табет + OY ут = ту 


с. Ryan = Vr AT yn + Tab ym + rab s 


fy =r, " 
Ба Vua m ee be braa = Pena 
Ty — f, 
ucc x MEAT 
e A m Y bu — by eX КЕШЕ Кы ЕЛЫ 
B 0% + rab yes = Ty ra bysa a yet + ryb yoa 
Tu — Tuts Tu — Turn 


6. а. бу = EES Y b. b'i = izh © buy = btaa 4. bra = Vas 


eA-2X,- bu;Xi — bak; f. Rin = Ут + Turbia 
& bar + rnb ins = rs табыл + а = ry 


Exercise 13.2, page 345 
E Ta — fan 
1. а. газ ГЕТЕ 


6 ту" TEE 
1- ۷1 = т 

Tua — Tw us 
V1 = Fg. (V1 — hus 
Талым — Ter 207 т.а 


2. т.м = 


Шии УГЕ Poa = Pa 
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betas = Ead’ ИННИ 


У-У T 


A. талы = Du.sebu ne 


5. 1 — Rus = (1 — r8) — r3) — e.1) 
m (1 — rh)(1 — rin4)(1 — т) ete, 
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Acceptance, of false hypothesis, 61; region 
of, 46; values, 47, 50 

Addition of probabilities, 15 

Adjusted means in analysis of covariance, 
397 

Algebraic relations in analysis of covari- 
ance, 388; in analysis of variance, 210 

Alpha, error, 60; as subscript indicating 
individual, 230 

Alternatives to hypothetical mean, 161; 
to hypothetical proportion, 60 

Analysis of covariance, 387 ff. ; adjusted 
means, 397; hypothesis of common 
slope, 390; hypothesis of no difference 
between regression estimates, 398, 406; 
hypothesis of single regression line, 393; 
hypothesis of zero slope, 393; line of 
non-significance, 410; matched regres- 
sion estimates, 398, 406; point of non- 
significance, 399; predictor variables, 
387, 404; region of significance, 400, 407; 
regression among means, 396; regres- 
sion in separate groups, 390; regression 
in total group, 390; subdivision of 
cross-products, 387; subdivision of sum 
of squares, 396, 405; symbolism, 387 

Analysis of variance, 196 ff., 348 ff,; alge- 
braic relations, 210; component sum 
of squares, 208, 224; critical region, 205; 
error variance, 220, 358; F test, 204-08; 
Graeco-Latin square, 373; hypotheses, 
201, 349; interaction, 349; Latin square, 
373; main effects, 371; matched groups, 
382; mathematical model, 201, 360; 
orthogonal comparisons, 356; by ranks, 
438; in regression, 244, 324; scores not 
available, 226; several measures on 
eaclt individual, 216, 363; subdivision 
of sum of squares, 211, 355; two bases 
of classification, 349; unequal frequen- 
cies in subclasses, 381; variance ratio, 
204 

Angular transformation, 423;.Table XIX, 
479 

Aresine transformation, 423; Table XIX, 
479 


Attenuation, 300 
Back solution, 328, 329, 335 


Bartlett’s test for homogeneity of variance, 
193 


Beta error, 60; regression weight, 239 

Biased estimate, 119, 240; sample, 171 

Binomial distribution, 21; aids for com- 
puting probabilities in, 57; Tables IV 
and V, 458-60; approximation by nor- 
mal curve, 31; graphic representation, 
20, 24, 25, 26, 38, 41, 47; mean of, 29; 
standard deviation of, 29 

Biserial correlation, 261, 267; comparison 
with point biserial, 271; transformation 
to normal variate, 269 

Bivariate normal distribution; see Normal 
bivariate distribution 


C values in analysis of covariance, 389; 
in solution of normal equations, 333 

Central limit theorem, 143 

Chi-square, 85; comparison with Kolmo- 
gorov-Smirnov test, 443; curves, 86; 
exact probabilities, 86; graphic repre- 
sentation, 87, 88, 89, 134, 135; observed 
and expected frequencies, 93; small ex- 
pectations, 106; table, 91; Table УШ, 
464; in tests of independence, 95; Yates’ 
correction, 105 

Classes, 9; exhaustive, 14; mutually ex- 
clusive, 14 

Clopper-Pearson chart, Chart VI, 461 

Cluster sampling, 175 

Coded scores, 237 

Comparisons, orthogonal, 356 

Components of a score, 293; of sum of 
squares in analysis of variance, 224, 351, 
355, 369; of sum of squares and prod- 
ucts in analysis of covariance, 387; of 
sum of squares in regression, 242, 322 

Computational routines: analysis of vari- 
ance, 353, 366, 377; chi-square test of 
normality, 121; coefficient of concord- 
ance, 283; coefficients of correlation 
and regression by use of a machine, 234; 
coefficients of correlation and regres- 
sion from a scatter diagram, 237; corre- 
lation ratio, 279; Doolittle method in 
multiple regression, 328-35; Kelley- 
Salisbury iterative method in multiple 
regression, 346; — Kuder-Richardson 
method for reliability coefficient, 311; 
mean and standard deviation of a sam- 
ple from grouped data, 117 

Conceptual population, 9 
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Concordance, coefficient of, 284, 439 

Confidence, coefficient, 53; interval, 53; 
limits, 53 

Confidence and probability, 55 

Conic, 407 

Consistency of estimate, 52, 414 

Contingency, coefficient of, 287 

Contingency table, 95; more than two 
classes in each variable, 96; same in- 
dividuals in both variables, 102; test of 
independence in, 95; two classes in each 
variable, 100; two classes in one varia- 
ble, 99; very small samples, 103 

Corner test of association, 447 

Correlation, biserial, 267; coefficient of 
contingency, 287; Flanagan’s estimate 
of, 275; multiple, 321; partial, 340; 
phi, 272; point biserial, 262; ranks by 
several judges, 283; ranks by two 
judges, 278; ratio, 276; tau, 286; tet- 
rachoric, 273 

Correlation coefficient (in bivariate popu- 
lation), 233; computation by machine, 
233; computation from scatter diagram, 
237; confidence interval for, 255; con- 
sistency among coefficients, 344; cor- 
rection for attenuation, 300; inefficient 
estimate of, 420; in normal bivariate 
population, 248; standard error of, 252; 
t test for, 251; test that two are equal, 
255; z transformation, 254 

Covariance, 248; analysis of, see Analysis 
of covariance 

Critical region, 46; choice of, 62-65; one- 
sided and two-sided, 64, 162; related to 
power of test, 62 

Cumulative distribution, 26, 427 


Degrees of freedom in general, 90; chi- 
square, 90; F distribution, 185; ran- 
dom sample, 130; "Student's" distri- 
bution, 146 

Density, probability, 25 

Design of experiments, 216, 348, 363, 373, 
375; of samples in surveys, 171 

Dichotomous population, 19; scale, 261 

Difference between two correlation coeffi- 
cients, 255-57; between two means, 
REN; between two proportions, 77- 

Discrete variable, 8 

Disproportionate subclass frequencies, 381 

Distribution, population, 9; bivariate, 
230, 246; multivariate, 337; normal, 
110; normal bivariate, 248; ‘of several 
classes, 80; of two classes, 19 

Distribution, sampling, 5, 118; binomial, 
21; chi-square, 84, 132; correlation 
coefficient, 249, 251, 255; F, 140; non- 


central ¢, 265; normal, 132; rank cor- 
relation, 281; “‘Student’s” or t, 135 
Distribution-free methods, 426; see also 
Non-parametric methods 
Doolittle method, 326; Fisher modifica- 
tion, 331 
Double subscripts, 196, 290 
Durost-Walker correlation chart, 239 


e, 110 

Efficiency, 144, 414 

Ellipses of bivariate normal distribution, 
250 

Error in measurement, 292; two kinds of, 
60; in analysis of covariance, 393, 395; 
in analysis of variance, 220, 358, 374, 
380, 382; see also Measurement error 

Estimation, 51; in analysis of variance, 
202, 220, 358, 361, 370; consistent, 51; 
efficient, 144; of error variance, 298; 
inefficient, 415; by an interval, 53; of 
mean, 144, 148-51, 416, 417; of point 
biserial correlation, 265; of regression 
statistics, 241, 339; of reliability coeffi- 
cient, 307-14; of p, 255, 270, 273, 275, 
420; by a single value, 51, 415; of 
standard deviation, 416, 417; of true 
variance, 297; unbiased, 51, 119; of 
variance, 119, 180 

Eta, 278 

Expectation, 27; of frequency, 29; of 
mean score in measurement, 290; of 
mean square, 226, 361, 362; of pro- 
portion, 30; of sample mean, 119; of 
sample variance, 119; of a variable, 
109; symbol for, 28 

Expectations, small, 106 

Expected value, see Expectation 

P row design, 216, 348, 362, 373, 


F ratio, 140; in analysis of variance, 205; 
comparison of variances, 185; oritical 
region, 180, 205; degrees of freedom, 
141; graphs, 141, 209, 210; independ- 
ence in, 140; relation to chi-square, 
189; relation to correlation ratio, 278; 
relation to t, 188; relation to z, 189; 
tables, 185; "Table X, 466; transform- 
fin to normal, 424; ‘reading tables of, 

Foxx distribution, 192; Table VII, 462 

Factorial, 18 

Finite population, 70, 167 

Fisher and Yates Tables, 186, 195 

Flanagan’s estimate of p, 275, 287; Table 
XIII, 472 

Fourfold correlation, 271; 
table, 100 

Frame, 171 


contingency 


Graeco-Latin square, 373 
Graphical representation of acceptance 
values, 48; binomial, 20, 24, 25, 38, 41, 
47; chi-square distribution, 87, 89, 134; 
confidence interval for cumulative dis- 
tribution, 442; confidence interval for 
E(d), 446; confidence interval for p, 
150; confidence interval for ps = Hy, 445; 
confidence interval for P, Chart VI, 461; 
confidence interval for p, Chart XIV, 476; 
confidence interval for c*, 182; cumu- 
lative binomial, 26, 41; cumulative 
chi-square, 87, 88, 135; cumulative Ё, 
141; cumulative normal, 41, 136; cu- 
mulative "Student's" or t, 136, 138, 139; 
F distribution, 131, 208, 209; inde- 
pendence of components of sum of 
squares in analysis of variance, 210; 
independence of X and s, 131; normal, 
38, 75, 120; normal bivariate ellipses, 
250; power function, 65, 73, 162; proba- 
bility in bivariate distribution, 247; 
region of rejection and rejection values 
for P, 48, 50; relations in analysis of 
covariance, 392, 394, 396, 398, 399, 404, 
409, 411; sampling distribution of r, 253 


High and low groups of test scores, 417; 

* estimation of correlation from, 275, 420; 
Chart XV, 477; Table XIII, 472; estima- 
tion of mean and standard deviation 
from, 417 

Homogeneity of variance, 191-95 

Hypothesis, 4; choice of critical region, 
64, 162; critical region, 46, 62; level of 
significance, 46; one-sided and two- 
sided regions, 60; power of a test, 62, 
102; procedures in testing, 5; sample 
size in test of, 72, 163 

Hypothetical population, 10 


Independence, in contingency tables, 95- 
106; of components of sum of squares, 
208; of mean and standard deviation, 
130; of observations, 14; among ranks 
of geveral judges, 283; between ranks 
of two judges, 278, 286; non-parametric 
test, 446 

Inefficient estimate, 275, 415 

Interaction, 349, 362, 369; test that inter- 
action is zero, 353, 370; used as error, 
360 

Internal consistency of a test, 311 

Interval estimate, see Confidence, interval 

Inverse matrix, 332 

Iterative method in regression, 345 


Johnson-Neyman method, 384, 406 
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k sample tests, analysis of variance, 196 f., 
348 ff.; analysis of variance by ranks, 
438; homogeneity of variance in k 
samples, 191; median test, 435; sum of 
ranks test, 436 

Kelley-Salisbury method, 345 

Kolmogorov-Smirnov test, 426, 440; com- 
parison with chi-square test, 443 

Kuder-Richardson formula, 311 


Latin square, 373; interaction in, 373; 
several observations in each cell, 377 

Layout, three-way, 363; two-way, 360 

Least squares, 325 

Level of significance, 46 

Linear correlation, see Correlation coeffi- 
cient 

Linear function, 132 

Linearity of regression, test for, 244; see 
also Multiple regression 

Location, tests of, 430 

Logarithm, Table XXII, 482; transforma- 
tion, 424 


Matched groups, 382 

Matched regression estimates, 398, 406 

Mathematical model, 109; analysis of 
covariance, 387; analysis of variance, 
201, 217, 361, 362, 369; biserial correla- 
tion, 265; bivariate correlation, 246; 
point biserial correlation, 265; regres- 
sion, 239, 337 

Matrix, 196; explanation of Fisher- 
Doolittle method, 332 

Maximum likelihood, 415 . 

Mean, adjusted, in analysis of covariance, 
397; components of, in analysis of var- 
iance, 218, 349, 362, 369; computation 
in grouped data, 115; confidence inter- 
val for, 148; deviation, 420; difference 
of, see Difference between two means; es- 
timate based on percentiles, 416; Table 
XVII, 478; of a probability distribu- 
tion, 27; of a sample, 114; probability 
distribution of, 132, 144; standard error 
in cluster sampling, 144; standard error 
inrandom sampling, 144; standard error 
in sampling from finite universe, 167; 
standard error in stratified sampling, 
174; tests of hypotheses, 143 ff., 196 ff., 
443 

Mean square, 202; distribution of, 204; 
independence of, 208; population value 
estimated by, 362, 370; ratio of two, 205 

Measurement, 289 ff.; components of a 
score, 203; effect of error on corre- 
lation, 299; effect of error on the 
mean, 295; effect of error on regres- 
sion, 305; effect of error on tests of 
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significance, 306; effect of error on the 
variance, 297; error, 292; reliability, 
297; symbolism, 291; “true” variance, 
297; variance, 298 

Median, 413; efficiency of, 415; interval 
estimate for, 440; standard error of, 
414; symbolism, 413 

Median test for Ё samples, 436; for two 
samples, 435 

Multiple correlation, 321; computation, 
326; relation to partial correlation, 344; 
relation to regression, 322; test of hy- 
pothesis, 324 

Multiple regression, 318, 324; analysis of 
variance in, 324; effectiveness of pre- 
diction, 321; elimination of a predictor, 
339; normal equations, 325; partition 
of sum of squares, 322; tests of signifi- 
cance, 337; two predictors, 318 

Multiplication of probabilities, 15 


Non-central ¢ distribution, 265 

Non-parametric methods, 426 ff.; com- 
parison of k samples by analysis of 
variance on ranks, 438; by median test, 
435; by sum of ranks, 436; comparison 
of two samples by extended sign test, 
431; by Kolmogorov-Smirnov test, 427; 
by median test, 435; by run test, 428; 
by sign test, 430; by signed rank for 
paired observations, 432; by sum of 
ranks, 434; confidence interval for cu- 
mulative frequency, 441; confidence 
interval for E(d), 445; confidence in- 
terval for median, 440; confidence 
interval for и. — fy, 443; corner test of 
association, 447 

Не Шала, line of, 410; point of, 

9 

Normal bivariate distribution, 248; el- 
lipses of equal probability, 249; 
eters in, 248; probability density of, 
248; regression equations in, 258 

Normal distribution, 32; approximation 
to binomial, 37; to chi-square, 183; to 
distribution of т, 249; curve, 32; de- 
scription, 33; graphs of, 111, 120, 136; 
reading table of, 31; related to chi- 
square, 101; to Р, 189; to |, 136, 145; 
Tables I, IT, III, 454-57 

Normal equations, 325; Doolittle method, 
326; Fisher-Doolittle method, 331; 
Kelley-Salisbury method, 345 

Normality, assumption of, 130, 143, 201, 
239, 265, 267, 273, 275, 280, 416, 423, 
426; test of, 119 

Normalization of distribution, of correla- 
tion, 254, 421; "Table XXI, 481; of F, 
424; of ranks, 424 


Number of cases needed in sample, 69-76; 
163—67 


Observations, 8; independent, 14; paired, 

151, 190, 430, 432; random, 10 
One-sided and two-sided tests, 60 
Orthogonal comparisons, 356 


Parameter, 20; known, 239 

Parent population, 426 

Partial correlation, 340; order of correla- 
tion, 342; relation to multiple correla- 
tion, 344 

Percentile, 413; distribution of, 414; es- 
timation of mean by percentiles, 416, 
417; notation, 413; of standard devia- 
tion, 416, 417; standard error of, 414 

Phi coefficient, 272; comparison with tet- 
rachoric, 274; relation to chi-square, 
272 

Point biserial, 262; comparison with bi- 
serial, 271; confidence interval, 266 

Poisson distribution, 424 

Population, 1, 9, 109; mean and variance 
of, 109; see also Distribution, popula- 
tion 

Power function, 63, 65, 73, 162; of test, 
62, 162; relation to level of significance, 
63; relation to size of sample, 72, 163 

Predictor variable, 315, 318, 324, 387 

Probability, 2, 3; and confidence, 55; 
density, 25; laws of combination, 15; 
symbolism, 13 

Probability distribution, 4, 8; of statistic, 
5; see also Distribution 

Product moment, 248 

Proportion, 9, 43; in binomial distribu- 
tion, 21; confidence interval for, 53, 68, 
72; Chart VI, 461; difference between 
two, 76; distribution in large samples, 
37, 67; estimation by a single value, 51; 
in finite population, 70; hypotheses, 44— 
51; power function, 73; sample size for 
confidence interval, 69; sample size in 
test of hypothesis, 72; transformation 
to angles, 423; two kinds of error, 60 


Quantile, 413 
Quartile, 413 


Random, numbers, 126; observations, 10; 
sample, 10, 171 

Randomized block experiment, 363 

Rank correlation, 278; efficiency of, 283; 
as estimate of p, 282; relation to prod- 
ru Moment, 281; test of significance, 

1 

Ranks, analysis of variance by, 438; cor- 

relation between, 278, 284, 286; signed 


rank test, 432; sum of ranks test, 434, 
436; transformation to normal, 424; 
uniformization, 424 

Reciprocals, Table XXI, 481 

Region, of acceptance, 46; critical, 46; 
of rejection, 46; of significance, 46, 400, 
407; size of, 46 

Regression in bivariate population, among 
means, 396; in analysis of covariance, 
388, 398; analysis of variance in, 244; 
computation by machine, 233; compu- 
tation from scatter diagram, 237, 239; 
confidence intervals, 241; and correla- 
tion, 233; effect of measurement error 
on, 305; equation, 231; estimate, 231; 
linearity, test of, 245; list of formulas, 
259; mathematical model, 239; parti- 
tion of sums of squares, 242; precision 
of estimate, 244; residual error, 232; 
standard error of estimate, 240 

Regression, multiple, see Multiple regres- 
sion 

Rejection, region of, 46; values, 47 

Reliability, coefficient of, 297, 301; effect 
on correlation, 303; estimate from data, 
307; Kuder-Richardson formula, 311; 
relation to length of test, 302; Spearman- 
Brown prophecy formula, 303  * 

Residual, 236, 240, 242, 322, 393, 394 

Run test, 428 


Sample point, 50 

Sampling, biased, 172; cluster, 173, 175; 
design in surveys, 171; multistage, 173; 
random, 10, 172; simple, 171; stratified, 
173; systematic, 172; unit, 171 

Sampling distribution, see Distribution, 
sampling 

Score, components of, 293 

Sign test, 6, 430 

Signed test for paired observations, 432 

Significance level, 46; region, 46; relation 
to power, 63 

Size of region of significance, 46 

Slope of line of regression, 232 

Spearman-Brown prophecy formula, 303 

Squaré root, Table XXI, 481; transforma- 
tion, 424 

Squares, Table XXI, 481 

Standard deviation, binomial distribution, 
29; computation of, 115; estimation 
from item analysis, 417; estimation by 
percentiles, 416; of probability distri- 
bution, 28; of sample, 114 

Standard error, of correlation coefficient, 
252; of correlation coefficient estimated 
from rank correlation, 282; of a differ- 
ence in two means, 153, 154, 156, 157; 
of a mean, 144, 145, 153, 157, 168, 174, 
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177; of a median, 414; of a percentile, 
414; ofa proportion, 30, 71, 175, 177; 
of a regression coefficient, 241; of the 
z-transformation, 254, 269 

Standard error of ашаа, 240, 299 

Statistics, 5 

Statistical inference, 2 

Stratification, 173 

“Student’s” distribution, 135, 145; Table 
IX, 465 

Subscripts, 112, 196, 230 

Sum of ranks test, 434, 436 

Sum of squares, subdivision of, in analysis 
of covariance, 396, 405; in analysis of 
Mei) 211, 355; in regression, 242, 

Summation sign, 111 

Supply, see Population 

Surveys, design of samples in, 171 

Symbolism, see Glossary of symbols in 
Appendix 

Systematic selection, 172 


Tables giving basic data, marks of 42 
first-term statistics students and scores 
on a prognostic test, 235; preregistra- 
tion test scores of 447 college students on 
English test, Table X XIV, 486; scores of 
11 archers in a 3 X 3 layout, 364; scores 
of 36 subjects on memorization of piano 
music, classified on 4 criteria, 379; 
marks of 98 first-term statistics students 
on 3 criteria and scores on 3 predictor 
variables, 316; scores of 48 pupils in 
remedial reading classified in a 4 X 4 
layout, 350; scores for comparison of 
2 groups by matched regression esti- 
mates on 1 predictor variable, 388; sta- 
tistics for comparison of 2 groups by 
matched regression estimates on 2 pre- 
dictor variables, 411 

"Tables giving formulas or symbols, com- 
ponent sums of squares and related 
degrees of freedom in analysis of covari- 
ance, 396; formulas for estimating mean 
from percentiles, Table XVII, 478; 
formulas for estimating standard devia- 
tion from percentiles, Table XVIII, 478; 
population values estimated in a two- 
way layout in analysis of variance, 362; 
population values estimated in a three- 
way layout in analysis of variance, 370; 
symbols for mean and variance of a sub- 
group, 199; symbols for scores of many 
individuals on many traits, 291 

Tables giving significance values, absolute 
values of smaller rank sum, 433; max- 
imum difference between two sample 
eumulative distributions, 427, 428; 
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NDa for use in Kolmogorov-Smirnov 
test, 442; product moment r, Table XI, 
470; rank order r, Table XVI, 478 

Tables of probability, binomial, Tables 
IV, V, 458-60; chi-square, Table VIII, 
464; F ratio, Table X, 466; Fmax, Table 
VII, 462; normal, Tables I, II, III, 
454-56; sampling distribution of ob- 
served proportion р for N = 10 and 
various values of P, 49; "Student's" 
or t, Table IX, 465 

Tables to facilitate computation, loga- 
rithms, Table XXII, 482; r from pro- 
portions of success and failure, Table 
XIII, 472; r from tails of distribution, 
Chart XV, 477; random numbers, 
Table XXIII, 484; squares, square 
roots, and reciprocals, Table XXI, 481; 
transformation of proportions to radi- 
ans, Table XIX, 479; transformation 
ofr toz, Table XII, 471; transformation 
of ranks to standard scores, Table XX, 
480 

Tau, 286 

t distribution, 135, 145; Table IX, 465 

Test, of hypothesis, 6; see also Hypothe- 
sis; mental test, 289; see also Measure- 
ment 

Tetrachoric correlation, 273; comparison 
with phi, 274 

Transformation, of biserial correlation, 
269; of correlation coefficient, 254; of 
F, 425; inverse sign, 424; logarithmic, 
424; of proportions, 423; of ranks, 424; 
square root, 424; uniformization, 425; 
2, 254 
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True component of score, 293; variance of, 
297 
Two sample tests, chi-square test, 99-104; 
distribution free, 427; F test for com- 
parison of variances, 185; Kolmogorov- 
Smirnoy test, 426; median test, 435; 
run test, 428; sign test, 430; signed test 
for paired observations, 432; sum of 
ranks test, 434; ¢ test, 151-60, 166; 
test that pı = ps, 255 
Two-tailed test, see One-sided and two- 
„sided tests 


Unbiased estimate, 119 

Unequal subclass frequencies, 381 
Uniformization, 425 

Universe, see Population 


Variable, continuous, 8; discrete, 8 

Variance, analysis of, comparison of sev- 
eral variances, 191; comparison of two 
variances from independent samples, 
185; comparison of two variances based 
on related scores, 190; confidence in- 
terval, 180; of distribution, 28; of 
proportion, 29; relation to chi-square, 
178; of a sample, 119; unbiased esti- 
mate, 119; see also Analysis of variance 


Yates’ correction, 105 


z transformation of correlation, 254; Table 
ХП, 471 

z* transformation of biserial correlation, 
269 
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